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PROSTATE CANCER-RELATED GENE 3 (PG3) AND BIALLELIC MARKERS THEREOF 

FIELD OF THE INVENTION 

The present invention is directed to polynucleotides encoding a PG-3 polypeptide as well as 
5 the regulatory regions located at the 5 - and 3-ends of said coding region. The invention also relates 
to polypeptides encoded by the PG-3 gene. The invention also relates to antibodies directed 
specifically against such polypeptides that are useful as diagnostic reagents. The invention further 
encompasses biallelic markers of the PG-3 gene useful in genetic analysis. 

BACKGROUND OF THE INVENTION 

10 Cancer is one of the leading causes of death in industrialized countries. This makes cancer 

a serious burden in terms of public health, especially in view of the aging of the population. Indeed, 
over the next 25 years there will be a dramatic increase in the number of people developing cancer. 
Globally, 10 million new cancer patients are diagnosed each year and there will be 20 million new 
cancer diagnoses by the year 2020. 

1 5 In spite of a large number of available therapeutic techniques including but not limited to 

surgery, chemotherapy, radiotherapy, bone marow transplantation, and in spite of encouraging 
results obtained with experimental protocols in immunotherapy or gene therapy, the overall survival 
rate of cancer patients does not reach 50% after 5 years . Therefore, there is a strong need for both a 
reliable diagnostic procedure which would enable early-stage cancer prognosis, and for preventive 

20 and curative treatments of the disease. 

A cancer is a clonal proliferation of cells produced as a consequence of cumulative genetic 
damage that finally results in unrestrained cell growth, tissue invasion and metastasis (cell 
transformation). Regardless of the type of cancer, transformed cells carry damaged DNA as gross 
chromosomal translocations or, more subtly, as DNA amplification, rearrangement or even point 

25 mutations. 

Cancer is caused by the dysregulation of the expression of certain genes. The development 
of a tumor requires an important succession of steps. Each of these comprises the dysregulation of 
a gene either involved in cell cycle activity or in genomic stability and the emergence of an 
abnormal mutated clone which overwhelms the other normal cell types because of a proliferative 

30 advantage. Cancer indeed happens because of a combination of two mechanisms. Some mutations 
enhance cell proliferation, increasing the target population of cells for the next mutation. Other 
mutations affect the stability of the entire genome, increasing the overall mutation rate, as in the 
case of mismatch repair proteins (reviewed in Arnheim N & Shibata D, 1997). 

Recent studies have identified three groups of genes which are frequently mutated in 

35 cancer. The first two groups are involved in cell cycle activity , which is a mechanism that drives 
normal cell proliferation and ensures the normal development and homeostasis of the organism. 
Conversely, many of the properties of cancer cells - uncontrolled proliferation, increased mutation 
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rate, abnormal translocations and gen amplifications - can be attributed directly to perturbations of 
the normal regulation or progression of the cycle. 

The first group of genes, called oncogenes, are genes whose products activate cell 
proliferation. The normal non-mutant versions are called protooncogenes. The mutated forms are 
5 excessively or inappropriately active in promoting cell proliferation and act in the cell in a dominant 
way such that a single mutant allele is enough to affect the cell phenotype. Activated oncogenes are 
rarely transmitted as germline mutations since they are probably be lethal when expressed in all the 
cells in the organism. Therefore oncogenes can only be investigated in tumor tissues. Oncogenes 
and protooncogenes can be classified into several different categories according to their function. 

10 This classification includes genes that code for proteins involved in signal transduction such as: 
growth factors (i.e., sis, int-2); receptor and non-receptor protein-tyrosine kinases (i.e., erbB, src t 
bcr-abl, met, trk); membrane-associated G proteins (i.e., ras); cytoplasmic protein kinases (i.e., 
mitogen-activated protein kinase -MAPK- family, raf, mos, pak) y or nuclear transcription factors 
(i.e., myc, myb,fos,jun t ret) (for review see Hunter T, 1991 ; Fanger GR etal., 1997 ; Weiss FU et 

15 a/., 1997). 

The second group of genes which are frequently mutated in cancer, called tumor suppressor 
genes, are genes whose products inhibit cell growth. Mutant versions in cancer cells have lost their 
normal function, and act in the cell in a recessive way such that both copies of the gene must be 
inactivated in order to change the cell phenotype. Most importantly, the tumor phenotype can be 

20 rescued by the wild type allele, as shown by cell fusion experiments first described by Harris and 
colleagues (Harris H et aL, 1969). Germline mutations of tumor suppressor genes are transmitted 
and thus studied in both constitutional and tumor DNA from familial or sporadic cases. The current 
family of tumor suppressors includes DNA-binding transcription factors (i.e.,p53 t WT1) 9 
transcription regulators (i.e., RB, APC, and BRCAI), and protein kinase inhibitors (Le.,pI6), among 

25 others (for review, see Haber D & Harlow E, 1 997). 

The third group of genes which are frequently mutated in cancer, called mutator genes, are 
responsible for maintaining genome integrity and/or low mutation rates. Loss of function of both 
alleles increases cell mutation rates, and as a consequence, proto-oncogenes and tumor suppressor 
genes are mutated. Mutator genes can also be classified as tumor suppressor genes, except for the 

30 fact that tumorigenesis caused by this class of genes cannot be suppressed simply by restoration of a 
wild-type allele, as described above. Genes whose inactivation may lead to a mutator phenotype 
include mismatch repair genes (i.e., MLH1, MSH2), DNA helicases (i.e., BLM, WRN) or other genes 
involved in DNA repair and genomic stability (Le. 9 p53 9 possibly BRCA1 and BRCA2) (For review 
see Haber D & Harlow E, 1997; Fishel & Wilson. 1997 ; Ellis,1997). 

35 The recent development of sophisticated techniques for genetic mapping has resulted in an 

ever expanding list of genes associated with particular types of human cancers. The human haploid 
genome contains an estimated 80,000 to 100,000 genes scattered on a 3 x 10 9 base-long double- 
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stranded DNA. Each human being is diploid, Le. 9 possesses two haploid genomes, one from 
paternal origin, the other from maternal origin. The sequence of a given genetic locus may vary 
between individuals in a population or between the two copies of the locus on the chromosomes of a 
single individual. Genetic mapping techniques often exploit these differences, which are called 
5 polymorphisms, to map the location of genes associated with human phenotypes. 

One mapping technique, called the loss of heterozygosity (LOH) technique, is often 
employed to detect genes in which a loss of function results in a cancer, such as the tumor 
suppressor genes described above. Tumor suppressor genes often produce cancer via a two hit 
mechanism in which a first mutation, such as a point mutation (or a small deletion or insertion) 

10 inactivates one allele of the tumor suppressor gene. Often, this first mutation is inherited from 
generation to generation. A second mutation, often a spontaneous somatic mutation such as a 
deletion which deletes all or part of the chromosome carrying the other copy of the tumor 
suppressor gene, results in a cell in which both copies of the tumor suppressor gene are inactive. As 
a consequence of the deletion in the tumor suppressor gene, one allele is lost for any genetic marker 

15 located close to the tumor suppressor gene. Thus, if the patient is heterozygous for a marker, the 
tumor tissue loses heterozygosity, becoming homozygous or hemizygous. This loss of 
heterozygosity generally provides strong evidence for the existence of a tumor suppressor gene in 
the lost region. 

LOH has allowed the identification of several chromosomic regions associated with cancer. 

20 Indeed, substantial amounts of LOH data support the hypothesis that genes associated with distinct 
cancer types- are located within 8p23 region of the human genome. Several regions of chromosome 
arm 8p were found to be frequently deleted in a variety of human malignacies including those of the 
prostate, head and neck, lung and colon. Emi et al. demonstrated the involvement of the 8p23.1- 
8p21.3 region in cases of hepatocellular carcinoma, colorectal cancer, and non-small cell lung 

25 cancer (Emi et al., 1992). Yaremko, et al., (1994) showed the existence of two major regions of 
LOH for chromosome 8 markers in a sample of 87 colorectal carcinomas. The most prominent loss 
was found for 8p23.1-pter, where 45% of informative cases demonstrated loss of alleles. Scholnick 
et al. (Scholnick et al, 1996 and Sunwoo et al, 1996) demonstrated the existence of three distinct 
regions of LOH for the markers of chromosome 8 in cases of squamous cell carcinoma of the 

30 supraglottic larynx. They showed that the allelic loss of 8p23 marker D8S264 serves as a 
statistically significant, independent predictor of poor prognosis for patients with supraglottic 
squamous cell carcinoma. The study of 51 squamous cell carcinomas of the head and neck and 29 
oral squamous cell carcinoma cell lines showed a frequent allelic loss and homozygous deletion at 1 
or more loci located in the 8p23 region (Ishwad CS et al, 1999). In addition, a high resolution 

35 deletion map of 150 squamous cell carninomas of the larynx and oral cavity showed two distinct 
classes of deletion for the 8p23 region within the D8S264 to D8S1788 interval (Sunwoo et aL, 
1999). 
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In other studies, Nagai et al (1997) demonstrated the highest loss of heterozygosity in the 
specific region of 8p23 by genome wide scanning of LOH in 120 cases of hepatocellular carcinoma 
(HCC). Further studies using high-density polymorphic marker analysis identified three minimal 
deleted areas on chromosome 8p, one of them being a 5 cM area in 8p23, probably indicative of the 
5 presence of a tumor suppressor loci for HCC (Pineau P, et al, 1999). Gronwald et al. (1997) also 
demonstrated 8p23-pter loss in renal clear cell carcinomas. 

The same region is involved in specific cases of prostate cancer. Matsuyama et al. (1994) 
showed the specific deletion of the 8p23 band in prostate cancer cases, as monitored by FISH with 
D8S7 probe. They were able to document a substantial number of cases with deletions of 8p23 but 

10 retention of the 8p22 marker LPL. Moreover, Ichikawa et al (1996) deduced the existence of a 
prostate cancer metastasis suppressor gene and localized it to 8p23-ql2 by studies of metastasis 
suppression in highly metastatic rat prostate cells after transfer of human chromosomes. Recently 
Washburn et al (1997) were able to find substantial numbers of tumors with the allelic loss specific 
to 8p23 by LOH studies of 31 cases of human prostate cancer. In these samples they were able to 

15 define the minimal overlapping region with deletions covering genetic interval D8S262-D8S277. In 
addition, using PCR analysis of polymorphic microsatellite repeat markers, 29% of 60 prostate 
tumors showed LOH, at the locus D8S262 of the 8p23 region (Perinchery et al, 1999). 

Recent studies have also implicated the 8p23 region in other types of cancers such as 
fibrous histiocytomas, ovarian adenocarcinomas and gastric cancers. Indeed, comparative genomic 

20 hybridization data showed the involvment of the 8p23.1 region in fibrous histiocytomas and 

detected a minimal amplified region between D8S1819 and D8S550 containing a gene MASL1, the 
overexpression of which might be oncogenic (Sakabe et al, 1999). LOH was also observed for 27 
ovarian adenocarcinomas on 8p. Detailed examination of nine tumours with partial deletions 
defined three regions of overlap including two in 8p23 (Wright et al, 1998). Comparative genomic 

25 hybridization of 58 primary gastric cancers detected gain of the 8p22-23 region in 24% of the 
tumors and even high-level amplification of the same region in 5% of the tumors . This amplified 
region was narrowed down to 8p23.1 by reverse-painting FISH to prophase chromosomes 
(Sakakurae/a/., 1999). 

The present invention relates to the Prostate Cancer Related Gene 3 or PG-3 gene, a gene 

30 present in the 8p23 cancer candidate region, as well as diagnostic methods and reagents for 
detecting alleles of the PG-3 gene which may cause cancer, and therapies for treating cancer. 

SUMMARY OF THE INVENTION 
The present invention pertains to nucleic acid molecules comprising the genomic sequence 
and the cDNA sequence of a novel human gene which encodes a PG-3 protein. The PG-3 gene is 

35 localized in the 8p23 candidate region shown to be involved in several types of cancer by LOH 
studies and presents homology with the BRCA1 gene involved in transcriptional control through 
modulation of chromatin structure (Bochar et al, 2000), and in which mutations are thougth to be 
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responsible for 45% of inherited breast cancer and more than 80% of inherited breast and ovarian 
cancer. In addition, BRCA1 carriers have a 4-fold increased risk of colon cancer, whereas male 
carriers face a 3-fold increased risk of prostate cancer. 

The PG-3 genomic sequence comprises regulatory sequences located upstream (5*-end) and 
5 downstream (3 ! -end) of the transcribed portion of said gene, these regulatory sequences being also 
part of the invention. 

The invention also relates to the cDNA sequence encoding the PG-3 protein, as well as to 
the corresponding translation product. , 

Oligonucleotide probes or primers hybridizing specifically with a PG-3 genomic or cDNA 
10 sequence are also part of the present invention, as well as DNA amplification and detection methods 
using said primers and probes. 

A further object of the invention relates to recombinant vectors comprising any of the 
nucleic acid sequences described herein, and in particular to recombinant vectors comprising a PG- 
3 regulatory sequence or a sequence encoding a PG-3 protein. The present invention also relates to 
15 host cells and transgenic non-human animals comprising said nucleic acid sequences or 
recombinant vectors. 

The invention further encompasses biallelic markers of the PG-3 gene useful in genetic 
analysis. 

Finally, the invention is directed to methods for the screening of substances or molecules 
20 that inhibit the expression of PG-3, as well as to methods for the screening of substances or 

molecules that interact with a PG-3 polypeptide or that modulate the activity of a PG-3 polypeptide. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is a block diagram of an exemplary computer system. 

Figure 2 is a flow diagram illustrating one embodiment of a process 200 for comparing a new 
25 nucleotide or protein sequence with a database of sequences in order to determine the homology levels 
between the new sequence and the sequences in the database. 

Figure 3 is a flow diagram illustrating one embodiment of a process 250 in a computer for 
determining whether two sequences are homologous. 

Figure 4 is a flow diagram illustrating one embodiment of an identifier process 300 for 
30 detecting the presence of a feature in a sequence. 

BRIEF DESCRIPTION OF THE SEQUENCES PROVIDED IN THE SEQUENCE 

LISTING 

SEQ ID No 1 is a genomic sequence of PG-3 comprising the 5' regulatory region (upstream 
untranscribed region), the exons and introns, and the 3* regulatory region (downstream 
35 untranscribed region). 

SEQ ID No 2 is a cDNA sequence of PG-3. 

SEQ ID No 3 is the amino acid sequence encoded by the cDNA of SEQ ID No 2. 
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SEQ ID No 4 is a primer containing the additional PU 5* sequence further described in 
Example 2. 

SEQ ID No 5 is a primer containing the additional RP 5* sequence further described in 
Example 2. 

5 In accordance with the regulations relating to Sequence Listings, the following codes have 

been used in the Sequence Listing to indicate the locations of biallelic markers within the sequences 
and to identify each of the alleles present at the polymorphic base. The code "r" in the sequences 
indicates that one allele of the polymorphic base is a guanine, while the other allele is an adenine. 
The code "y" in the sequences indicates that one allele of the polymorphic base is a thymine, while 

10 the other allele is a cytosine. The code w m" in the sequences indicates that one allele of the 

polymorphic base is an adenine, while the other allele is a cytosine. The code "k" in the sequences 
indicates that one allele of the polymorphic base is a guanine, while the other allele is a thymine. 
The code "s" in the sequences indicates that one allele of the polymorphic base is a guanine, while 
the other allele is a cytosine. The code "w" in the sequences indicates that one allele of the 

15 polymorphic base is an adenine, while the other allele is a thymine. The nucleotide code of the 
original allele for each biallelic marker is the following: 



Biallelic marker Original allele 

5-390-177 C 

5-391-43 G 

20 5-392-222 T 

5-392-280 T 

4-59-27 G 

4-58-289 C 

4-54-199 A 

25 4-54-180 C 

4-51-312 G 

99-86-266 A 

4- 88-107 G 

5- 397-141 G 
30 5-398-203 C 

99-12738-248 A 

99-109-358 C 

99-12749-175 T 

4-21-154 C 

35 4-21-317 G 

4- 23-326 G 
99-12753-34 A 

5- 364-252 G 
99-12755-280 G 

40 99-12755-329 C 
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4-87-212 


A 




99-12757-318 


C 




99-12758-102 


G 




99-12758-136 


C 


5 


4-105-98 


A 




4-105-86 


G 




4-45-49 


T 




4-44-277 


T 




4-86-60 


C 


10 


4-84-334 


G 




99-78-321 


T 




99-12767-36 


G 




99-12767-143 


T 




99-12767-189 


T 


15 


99-12767-380 


G 




4-80-328 


C 




4-36-384 


C 




4-36-264 


G 




4-36-261 


C 


20 


4-35-333 


A 




4-35-240 


G 




4-35-173 


T 




4-35-133 


C 




99-12771-59 


T 


25 


99-12774-334 


A 




99-12776-358 


G 




99-12781-113 


A 




4-104-298 


C 




4-104-254 


G 


30 


4-104-250 


C 




4-104-214 


A 




99-12818-289 


T 




99-24807-271 


C 




99-24807-84 


G 


35 


99-12831-157 


G 




99-12831-241 


C 




99-12832-387 


T 




99-12836-30 


G 




99-12844-262 


C 


40 


4-24-74 


C 




4-24-246 


c 




4-24-314 


G 
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5 



10 



4-27-190 


A 


5-400-145 


G 


5-400-149 


G 


5-400-175 


T 


5-400-231 


T 


5-400-367 


A 


99-12852-110 


T 


99-12852-325 


A 


4-37-326 


A 


4-37-107 


G 


5-270-92 


G 


99-12860-47 


G 


99-12860-57 


T 


5-402-144 


C 



15 In some instances, the polymorphic bases of the biallelic markers alter the identity of an 

amino acid in the encoded polypeptide. This is indicated in the accompanying Sequence Listing by 
use of the feature VARIANT, placement of an Xaa at the position of the polymorphic amino acid, 
and definition of Xaa as the two alternative amino acids. For example if one allele of a biallelic 
marker is the codon CAC, which encodes histidine, while the other allele of the biallelic marker is 

20 CAA, which encodes glutamine, the Sequence Listing for the encoded polypeptide will contain an 
Xaa at the location of the polymorphic amino acid. In this instance, Xaa would be defined as being 
histidine or glutamine. 

DETAILED DESCRIPTION 

The present invention concerns polynucleotides and polypeptides related to the PG-3 gene. 

25 Oligonucleotide probes and primers hybridizing specifically with a genomic or a cDNA sequence of 
PG-3 are also part of the invention. A further object of the invention relates to recombinant vectors 
comprising any of the nucleic acid sequences described in the present invention, and in particular 
recombinant vectors comprising a regulatory region of PG-3 or a sequence encoding the PG-3 
protein, as well as host cells comprising said nucleic acid sequences or recombinant vectors. The 

30 invention also encompasses methods of screening for molecules which inhibit the expression of the 
PG-3 gene or which modulate the activity of the PG-3 protein. The invention also relates to 
antibodies directed specifically against such polypeptides that are useful as diagnostic reagents. 

The invention also concerns PG-3-related biallelic markers which can be used in any 
method of genetic analysis including linkage studies in families, linkage disequilibrium studies in 

35 populations and association studies of case-control populations. An important aspect of the present 
invention is that biallelic markers allow association studies to be performed to identify genes 
involved in complex traits. These biallelic markers may lead to allelic variants of the PG-3 protein. 

Definitions 
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typically comprises about 50%, preferably 60 to 90% weight/weight of a polypeptide or 
polynucleotide sample, respectively, more usually about 95%, and preferably is over about 99% 
pure. Polypeptide and polynucleotide purity, or homogeneity, is indicated by a number of means 
well known in the art, such as agarose orpolyacrylamide gel electrophoresis of a sample, followed 

5 by visualizing a single band upon staining the gel. For certain purposes higher resolution can be 
provided by using HPLC or other means well known in the art. As an alternative embodiment, 
purification of the polypeptides and polynucleotides of the present invention may be expressed as "at 
least" a percent purity relative to heterologous polypeptides and polynucleotides (DNA, RNA or both). 
As a preferred embodiment, the polypeptides and polynucleotides of the present invention are at least; 

10 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 96%, 98%, 99%, or 100% pure 
relative to heterologous polypeptides and polynucleotides, respectively. As a further preferred 
embodiment the polypeptides and polynucleotides have a purity ranging from any number, to the 
thousandth position, between 90% and 100% (e.g., a polypeptide or polynucleotide at least 99.995% 
pure) relative to either heterologous polypeptides or polynucleotides, respectively, or as a 

15 weight/weight ratio relative to all compounds and molecules other than those existing in the carrier. 
Each number representing a percent purity, to the thousandth position, may be claimed as individual 
species of purity. 

The term '" polypeptide " refers to a polymer of amino acids without regard to the length of 
the polymer; thus, peptides, oligopeptides, and proteins are included within the definition of 

20 polypeptide. This term also does not specify or exclude post-expression modifications of 

polypeptides, for example, polypeptides which include the covalent attachment of glycosyl groups, 
acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term 
polypeptide. Also included within the definition are polypeptides which contain one or more 
analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids 

25 which only occur naturally in an unrelated biological system, modified amino acids from 

mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications 
known in the art, both naturally occurring and non-naturally occurring. 

The term " recombinant polypeptide " is used herein to refer to polypeptides that have been 
artificially designed and which comprise at least two polypeptide sequences that are not found as 

30 contiguous polypeptide sequences in their initial natural environment, or to refer to polypeptides 
which have been expressed from a recombinant polynucleotide. 

As used herein, the term "non-human animal " refers to any non-human vertebrate, birds and 
more usually mammals, preferably primates, farm animals such as swine, goats, sheep, donkeys, 
and horses, rabbits or rodents, more preferably rats or mice. As used herein, the term "animal" is 

35 used to refer to any vertebrate, preferable a mammal. Both the terms "animal" and "mammal" 
expressly embrace human subj cts unless preceded with the term "non-human". 
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As used herein, the term " antibody " refers to a polypeptide or group of polypeptides which 
are comprised of at least one binding domain, where an antibody binding domain is formed from the 
folding of variable domains of an antibody molecule to form three-dimensional binding spaces with 
an internal surface shape and charge distribution complementary to the features of an antigenic 
5 determinant of an antigen, which allows an immunological reaction with the antigen. Antibodies 
include recombinant proteins comprising the binding domains, as wells as fragments, including Fab, 
Fab', F(ab>2, and F(ab')2 fragments. 

As used herein, an "antigenic determinant " is the portion of an antigen molecule, in this 
case a PG-3 polypeptide, that determines the specificity of the antigen-antibody reaction. An 

10 "epitope" refers to an antigenic determinant of a polypeptide. An epitope can comprise as few as 3 
amino acids in a spatial conformation which is unique to the epitope. Generally an epitope consists 
of at least 6 such amino acids, and more usually at least 8-10 such amino acids. Methods for 
determining the amino acids which make up an epitope include x-ray crystallography, 2- 
dimensional nuclear magnetic resonance, and epitope mapping e.g. the Pepscan method described 

15 by Geysen et al 1984; PCT Publication No. WO 84/03564; and PCT Publication No. WO 
84/03506. 

Throughout the present specification, the expression '" nucleotide seauence " may be 
employed to designate indifferently a polynucleotide or a nucleic acid. More precisely, the 
expression "nucleotide sequence" encompasses the nucleic material itself and is thus not restricted 

20 to the sequence information (i.e. the succession of letters chosen among the four base letters) that 
biochemically characterizes a specific DNA or RNA molecule. 

As used interchangeably herein, the terms " nucleic acids ", " oligonucleotides ", and 
"polynucleotides " include RNA, DNA, or RNA/DNA hybrid sequences of more than one 
nucleotide in either single chain or duplex form. The term "nucleotide" as used herein as an 

25 adjective to describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any 
length in single-stranded or duplex form. The term "nucleotide" is also used herein as a noun to 
refer to individual nucleotides or varieties of nucleotides, meaning a molecule, or individual unit in 
a larger nucleic acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar 
moiety, and a phosphate group, or phosphodiester linkage in the case of nucleotides within an 

30 oligonucleotide or polynucleotide. The term "nucleotide" is also used herein to encompass 

"modified nucleotides" which comprise at least one of the following modifications (a) an alternative 
linking group, (b) an analogous form of purine, (c) an analogous form of pyrimidine, or (d) an 
analogous sugar, for examples of analogous linking groups, purine, pyrimidines, and sugars see for 
example PCT publication No. WO 95/04064. The polynucleotide sequences of the invention may 

35 be prepared by any known method, including synthetic, recombinant, ex vivo generation, or a 
combination thereof, as well as utilizing any purification methods known in the art. 



WO 01/14550 PCT/IB00/01098 

12 

A " promoter " refers to a DNA sequence recognized by the synthetic machinery of the cell 
required to initiate the specific transcription of a gene. 

A sequence which is " operably linked " to a regulatory sequence such as a promoter means 
that said regulatory element is in the correct location and orientation in relation to the nucleic acid 
5 to control RNA polymerase initiation and expression of the nucleic acid of interest As used herein, 
the term "operably linked" refers to a linkage of polynucleotide elements in a functional 
relationship. For instance, a promoter or enhancer is operably linked to a coding sequence if it 
affects the transcription of the coding sequence. More precisely, two DNA molecules (such as a 
polynucleotide containing a promoter region and a polynucleotide encoding a desired polypeptide 
10 or polynucleotide) are said to be "operably linked" if the nature of the linkage between the two 
polynucleotides does not (1) result in the introduction of a frame-shift mutation or (2) interfere with 
the ability of the polynucleotide containing the promoter to direct the transcription of the coding 
polynucleotide. 

The term primer " denotes a specific oligonucleotide sequence which is complementary to 
1 5 a target nucleotide sequence and used to hybridize to the target nucleotide sequence. A primer 
serves as an initiation point for nucleotide polymerization catalyzed by either DNA polymerase, 
RNA polymerase or reverse transcriptase. 

The term "probe " denotes a defined nucleic acid segment (or nucleotide analog segment, 
e.g., polynucleotide as defined herein) which can be used to identify a specific polynucleotide 
20 sequence present in samples, said nucleic acid segment comprising a nucleotide sequence 
complementary of the specific polynucleotide sequence to be identified. 

The terms " trait " and " phenorvpe " are used interchangeably herein and refer to any visible, 
detectable or otherwise measurable property of an organism such as symptoms of, or susceptibility 
to a disease for example. Typically the terms "trait" or "phenotype" are used herein to refer to 
25 symptoms of, or susceptibility to a disease, a beneficial response to or side effects related to a 
treatment. Preferably, said trait can be, without being limited to, cancers, developmental diseases, 
and neurological diseases. 

The term "allele" is used herein to refer to variants of a nucleotide sequence. A biallelic 
polymorphism has two forms. Typically the first identified allele is designated as the original allele 
30 whereas other alleles are designated as alternative alleles. Diploid organisms may be homozygous 
or heterozygous for an allelic form. 

The term " heterozygosity rate " is used herein to refer to the incidence of individuals in a 
population which are heterozygous at a particular allele. In a biallelic system, the heterozygosity 
rate is on average equal to 2P a (l-P a ), where P a is the frequency of the least common allele. In order 
35 to be useful in genetic studies, a genetic marker should have an adequate level of heterozygosity to 
allow a reasonable probability that a randomly selected person will be heterozygous. 
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The term " genotype " as used herein refers the identity of the alleles present in an individual 
or a sample. In the context of the present invention, a genotype preferably refers to the description 
of the biallelic marker alleles present in an individual or a sample. The term "genotyping" a sample 
or an individual for a biallelic marker consists of determining the specific allele or the specific 
5 nucleotide carried by an individual at a biallelic marker. 

The term " mutation " as used herein refers to a difference in DNA sequence between or 
among different genomes or individuals which has a frequency below 1%. 

The term " haplotvpe " refers to a combination of alleles present in an individual or a sample. 
In the context of the present invention, a haplotype preferably refers to a combination of biallelic 

10 marker alleles found in a given individual and which may be associated with a phenotype. 

The term " polymorphism " as used herein refers to the occurrence of two or more alternative 
genomic sequences or alleles between or among different genomes or individuals. "Polymorphic" 
refers to the condition in which two or more variants of a specific genomic sequence can be found 
in a population. A "polymorphic site" is the locus at which the variation occurs. A single 

15 nucleotide polymorphism is the replacement of one nucleotide by another nucleotide at the 

polymorphic site. Deletion of a single nucleotide or insertion of a single nucleotide also gives rise 
to single nucleotide polymorphisms. In the context of the. present invention, "single nucleotide 
polymorphism" preferably refers to a single nucleotide substitution. Typically, between different 
individuals, the polymorphic site may be occupied by two different nucleotides. 

20 The term " biallelic polymorphism " and " biallelic marker " are used interchangeably herein 

to refer to a single nucleotide polymorphism having two alleles at a fairly high frequency in the 
population. A "biallelic marker allele" refers to the nucleotide variants present at a biallelic marker 
site. Typically, the frequency of the less common allele of the biallelic markers of the present 
invention has been validated to be greater than 1%, preferably the frequency is greater than 10%, 

25 more preferably the frequency is at least 20% (i.e. heterozygosity rate of at least 0.32), even more 
preferably the frequency is at least 30% heterozygosity rate of at least 0.42). A biallelic marker 
wherein the frequency of the less common allele is 30% or more is termed a "high quality biallelic 
marker". 

The location of nucleotides in a polynucleotide with respect to the center of the 
30 polynucleotide are described herein in the following manner. When a polynucleotide has an odd 
number of nucleotides, the nucleotide at an equal distance from the 3* and 5' ends of the 
polynucleotide is considered to be " at the center 3 ' of the polynucleotide, and any nucleotide 
immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is 
considered to be "within 1 nucleotide of the center." With an odd number of nucleotides in a 
35 polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would be 
considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an even 
number of nucleotides, there would be a bond and not a nucleotide at the center of the 
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polynucleotide. Thus, either of the two central nucleotides would be considered to be 'Svithin 1 
nucl otide of the center" and any of the four nucleotides in the middle of the polynucleotide would 
be considered to be "within 2 nucleotides of the center", and so on. For polymorphisms which 
involve the substitution, insertion or deletion of 1 or more nucleotides, the polymorphism, allele or 
5 biallelic marker is "at the center" of a polynucleotide if the difference between the distance from the 
substituted, inserted, or deleted polynucleotides of the polymorphism and the 3' end of the 
polynucleotide, and the distance from the substituted, inserted, or deleted polynucleotides of the 
polymorphism and the 5' end of the polynucleotide is zero or one nucleotide. If this difference is 0 
to 3, then the polymorphism is considered to be "within 1 nucleotide of the center." If the 
10 difference is 0 to 5, the polymorphism is considered to be 'Svithin 2 nucleotides of the center." If the 
difference is 0 to 7, the polymorphism is considered to be "within 3 nucleotides of the center," and 
so on. 

The term "upstream " is used herein to refer to a location which is toward the 5' end of the 
polynucleotide from a specific reference point. 

1 5 The terms " base paired " and " Watson & Crick base paired " are used interchangeably herein 

to refer to nucleotides which can be hydrogen bonded to one another be virtue of their sequence 
identities in a manner like that found in double-helical DNA with thymine or uracil residues linked 
to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three 
hydrogen bonds (See Stryer, L., 1995). 

20 The terms " complementary ' or " complement thereof * are used herein to refer to the 

sequences of polynucleotides which is capable of forming Watson & Crick base pairing with 
another specified polynucleotide throughout the entirety of the complementary region. For the 
purpose of the present invention, a first polynucleotide is deemed to be complementary to a second 
polynucleotide when each base in the first polynucleotide is paired with its complementary base. 

25 Complementary bases are, generally, A and T (or A and U), or C and G. "Complement" is used 
herein as a synonym of "complementary polynucleotide", "complementary nucleic acid" and 
"complementary nucleotide sequence". These terms are applied to pairs of polynucleotides based 
solely upon their sequences and not any particular set of conditions under which the two 
polynucleotides would actually bind. 

30 Variants and Fragments 
1- Polynucleotides 

The invention also relates to variants and fragments of the polynucleotides described herein, 
particularly of a PG-3 gene containing one or more biallelic markers according to the invention. 

Variants of polynucleotides, as the term is used herein, are polynucleotides that differ from 
35 a reference polynucleotide. A variant of a polynucleotide may be a naturally occurring variant such 
as a naturally occurring allelic variant, or it may be a variant that is not known to occur naturally. 
Such non-naturally occurring variants of the polynucleotide may be made by mutagenesis 
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techniques, including those applied to polynucleotides, cells or organisms. Generally, differences 
are limited so that the nucleotide sequences of the reference and the variant are closely similar 
overall and, in many regions, identical. 

Variants of polynucleotides according to the invention include, without being limited to, 
5 nucleotide sequences which are at least 95% identical to a polynucleotide selected from the group 
consisting of the nucleotide sequences of SEQ ID Nos 1 and 2 or to any polynucleotide fragment of 
at least 12 consecutive nucleotides of a polynucleotide selected from the group consisting of the 
nucleotide sequences of SEQ ID Nos 1 and 2, and preferably at least 99% identical, more 
particularly at least 99.5% identical, and most preferably at least 99.8% identical to a polynucleotide 

10 selected from the group consisting of the nucleotide sequences of SEQ ID Nos 1 and 2, or to any 
polynucleotide fragment of at least 12 consecutive nucleotides of a polynucleotide selected from the 
group consisting of the nucleotide sequences of SEQ ID Nos 1 and 2. 

Nucleotide changes present in a variant polynucleotide may be silent, which means that 
they do not alter the amino acids encoded by the polynucleotide. However, nucleotide changes may 

15 also result in amino acid substitutions, additions, deletions, fusions and truncations in the 
polypeptide encoded by the reference sequence. The substitutions, deletions or additions may 
involve one or more nucleotides. The variants may be altered in coding or non-coding regions or 
both. Alterations in the coding regions may produce conservative or non-conservative amino acid 
substitutions, deletions or additions. 

20 In the context of the present invention, particularly preferred embodiments are those in 

which the polynucleotides encode polypeptides which retain substantially the same biological 
function or activity as the mature PG-3 protein, or those in which the polynucleotides encode 
polypeptides which maintain or increase a particular biological activity, while reducing a second 
biological activity. 

25 A polynucleotide fragment is a polynucleotide having a sequence that is entirely the same 

as part but not all of a given nucleotide sequence, preferably the nucleotide sequence of a PG-3 
gene, and variants thereof. The fragment can be a portion of an intron or an exon of a PG-3 gene. It 
can also be a portion of the regulatory regions of PG-3. Preferably, such fragments comprise at 
least one of the biallelic markers Al to A80 or the complements thereto or a biallelic marker in 

30 linkage disequilibrium with one or more of the biallelic markers Al to A80. 

Such fragments may be "free-standing", i.e. not part of or fused to other polynucleotides, or 
they may be comprised within a single larger polynucleotide of which they form a part or region. 
Indeed, several of these fragments may be present within a single larger polynucleotide. 

Optionally, such fragments may comprise, consist of, or consist essentially of a contiguous 

35 span of at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 250, 500 or 1000 nucleotides in 
length. A set of preferred fragments contain at least one of the biallelic markers Al to A80 of the 
PG-3 gene which are described herein or the complements thereto. 
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2- Polypeptides 

The invention also relates to variants, fragments, analogs and derivatives of the 
polypeptides described herein, including mutated PG-3 proteins. 

The variant may be 1) one in which one or more of the amino acid residues are substituted 
5 with a conserved or non-conserved amino acid residue and such substituted amino acid residue may 
or may not be one encoded by the genetic code, or 2) one in which one or more of the amino acid 
residues includes a substituent group, or 3) one in which the mutated PG-3 is fused with another 
compound, such as a compound to increase the half-life of the polypeptide (for example, 
polyethylene glycol), or 4) one in which the additional amino acids are fused to the mutated PG-3, 
10 such as a leader or secretory sequence or a sequence which is employed for purification of the 
mutated PG-3 or a preprotein sequence. Such variants are deemed to be within the scope of those 
skilled in the art. 

A polypeptide fragment is a polypeptide having a sequence that is entirely the same as part 
but not all of a given polypeptide sequence, preferably a polypeptide encoded by a PG-3 gene and 
15 variants thereof. 

In the case of an amino acid substitution in the amino acid sequence of a polypeptide 
according to the invention, one or several amino acids can be replaced by "equivalent" amino acids. 
The expression "equivalent" amino acid is used herein to designate any amino acid that may be 
substituted for one of the amino acids having similar properties, such that one skilled in the art of 

20 peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide 
to be substantially unchanged. Generally, the following groups of amino acids represent equivalent 
changes: (1) Ala, Pro, Gly, Glu, Asp, Gin, Asn, Ser, Thr; (2) Cys, Ser, Tyr, Thr; (3) Val, He, Leu, 
Met, Ala, Phe; (4) Lys, Arg, His; (5) Phe, Tyr, Trp, His. 

A specific embodiment of a modified PG-3 peptide molecule of interest according to the 

25 present invention, includes, but is not limited to, a peptide molecule which is resistant to 

proteolysis, a peptide in which the -CONH- peptide bond is modified and replaced by a (CH2NH) 
reduced bond, a (NHCO) retro inverso bond, a (CH2-0) methylene-oxy bond, a (CH2-S) 
thiomethylene bond, a (CH2CH2) carba bond, a (CO-CH2) cetomethylene bond, a (CHOH-CH2) 
hydroxyethylene bond), a (N-N) bound, a E-alcene bond or also a -CH=CH- bond. The invention 

30 also encompasses a human PG-3 polypeptide or a fragment or a variant thereof in which at least one 
peptide bond has been modified as described above. 

Such fragments may be "free-standing", i.e. not part of or fused to other polypeptides, or 
they may be included within a single larger polypeptide of which they form a part or region. 
However, several fragments may be included within a single larger polypeptide. 

35 As representative examples of polypeptide fragments of the invention, there may be 

mentioned those which are from about 5, 6, 7, 8, 9 or 10 to 15, 10 to 20, 15 to 40, or 30 to 55 amino 
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acids long. Preferred are those fragments containing at least one amino acid mutation in the PG-3 
protein. 

Identity Between Nucleic Acids Or P lypeptides 

The terms percentage of sequence identity " and ' "percentage homology " are used 
5 interchangeably herein to refer to comparisons among polynucleotides and polypeptides, and are 
determined by comparing two optimally aligned sequences over a comparison window, wherein the 
portion of the polynucleotide or polypeptide sequence in the comparison window may comprise 
additions or deletions gaps) as compared to the reference sequence (which does not comprise 
additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by 

10 determining the number of positions at which the identical nucleic acid base or amino acid residue 
occurs in both sequences to yield the number of matched positions, dividing the number of matched 
positions by the total number of positions in the window of comparison and multiplying the result 
by 100 to yield the percentage of sequence identity. Homology is evaluated using any of the variety 
of sequence comparison algorithms and programs known in the art. Such algorithms and programs 

1 5 include, but are by no means limited to, TBLASTN, BLASTP, FASTA, TFASTA, and 

CLUSTALW (Pearson and Lipman, 1988; Altschul a/., 1990; Thompson era/., 1994; Higgins 
et aL 9 1996; Altschul et aL> 1993). In a particularly preferred embodiment, protein and nucleic acid 
sequence homologies are evaluated using the Basic Local Alignment Search Tool ("BLAST") 
which is well known in the art (see, e.g., Karlin and Altschul, 1990; Altschul et al. 9 1990, 1993, 

20 1997). In particular, five specific BLAST programs are used to perform the following task: 

(1) BLASTP and BLAST3 compare an amino acid query sequence against a protein 
sequence database; 

(2) BLASTN compares a nucleotide query sequence against a nucleotide sequence 
database; 

25 (3) BLASTX compares the six-frame conceptual translation products of a query nucleotide 

sequence (both strands) against a protein sequence database; 

(4) TBLASTN compares a query protein sequence against a nucleotide sequence database 
translated in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations of a nucleotide query sequence against 
30 the six-frame translations of a nucleotide sequence database. 

The BLAST programs identify homologous sequences by identifying similar segments, 
which are referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid 
sequence and a test sequence which is preferably obtained from a protein or nucleic acid sequence 
database. High-scoring segment pairs are preferably identified (i.e., aligned) by means of a scoring 
35 matrix, many of which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 
matrix (Gonnet et al y 1992; HenikoffandHenikoff, 1993). Less preferably, the PAM or PAM250 
matrices may also be used (see, e.g., Schwartz and Dayhoff, eds., 1978). The BLAST programs 
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evaluate the statistical significance of all high-scoring segment pairs identified, and preferably 
selects those segments which satisfy a user-specified threshold of significance, such as a user- 
specified percent homology. Preferably, the statistical significance of a high-scoring segment pair 
is evaluated using the statistical significance formula of Karlin (see, e.g., Karlin and Altschul, 

5 1990). The BLAST programs may be used with the default parameters which are implemented in 
the absence of further instructions from the user. Alternatively, the BLAST programs may be used 
with parameters specified by the user. 
Stringent Hybridization Conditions 

By way of example and not limitation, procedures using conditions of high stringency are 

10 as follows: Prehybridization of filters containing DNA is carried out for 8 h to overnight at 65°C in 
buffer composed of 6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 
0.02% BSA, and 500 ug/ml denatured salmon sperm DNA. Filters are hybridized for 48 h at 65°C, 
the preferred hybridization temperature, in prehybridization mixture containing 100 ug/ml 
denatured salmon sperm DNA and 5-20 X 10 6 cpm of 32 P-labeled probe. Alternatively, the 

15 hybridization step can be performed at 65°C in the presence of SSC buffer, IX SSC corresponding 
to 0.1 5M NaCl and 0.05 M Na citrate. Subsequently, filter washes can be done at 37°C for 1 h in a 
solution containing 2X SSC, 0.01% PVP, 0.01% Ficoll, and 0.01% BSA, followed by a wash in 
0.1X SSC at 50°C for 45 min. Alternatively, filter washes can be performed in a solution 
containing 2X SSC and 0.1% SDS, or 0.5X SSC and 0.1% SDS, or 0.1X SSC and 0.1% SDS at 

20 68°C for 15 minute intervals. Following the wash steps, the hybridized probes are detectable by 
autoradiography. Other conditions of high stringency which may be used are well known in the art 
and are cited in Sambrook et aL, 1989; and Ausubel et ai, 1989. These hybridization conditions 
are suitable for a nucleic acid molecule of about 20 nucleotides in length. There is no need to say 
that the hybridization conditions described above are to be adapted according to the length of the 

25 desired nucleic acid, following techniques well known to the one skilled in the art. The suitable 
hybridization conditions may for example be adapted according to the teachings disclosed in Hames 
and Higgins (1985) or in Sambrook et a/.(1989). 

GENOMIC SEQUENCES OF THE PG-3 GENE 
The present invention concerns the genomic sequence of PG-3. The present invention 

30 encompasses the PG-3 gene, or PG-3 genomic sequences consisting of, consisting essentially of, or 
comprising the sequence of SEQ ID No 1, sequences complementary thereto, as well as fragments 
and variants thereof. These polynucleotides may be purified, isolated, or recombinant. 

The invention also encompasses a purified, isolated, or recombinant polynucleotide 
comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, or 95% nucleotide identity with 

35 the nucleotide sequence of SEQ ID No 1 or a complementary sequence thereto or a fragment 

thereof. The nucleotide differences with regard to the nucleotide sequence of SEQ ID No 1 may be 
generally randomly distributed throughout the entire nucleic acid. Nevertheless, preferred nucleic 
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acids are those wherein the nucleotide differences as regards to the nucleotide sequence of SEQ ID 
No 1 are predominantly located outside the coding sequences contained in the exons. These nucleic 
acids, as well as their fragments and variants, may be used as oligonucleotide primers or probes in 
order to detect the presence of a copy of the PG-3 gene in a test sample, or alternatively in order to 
5 amplify a target nucleotide sequence within the PG-3 sequences. 

Another object of the invention relates to a purified, isolated, or recombinant nucleic acid 
that hybridizes with the nucleotide sequence of SEQ ID No 1 or a complementary sequence thereto 
or a variant thereof, under the stringent hybridization conditions as defined above. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 

10 recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 1: 1-97921,98517-103471, 103603-108222, 108390-109221, 109324- 
114409, 114538-115723, 115957-122102, 122225-126876, 127033-157212, 157808-240825. 

15 Additional preferred nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 
80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements thereof, 
wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 1: 1-10000, 10001-20000, 20001-30000, 30001-40000, 40001-50000, 

20 50001-60000, 60001-70000, 70001-80000, 80001-90000, 90001-97921, 98517-103471, 103603- 
108222, 108390-109221, 109324-114409, 114538-115723, 115957-122102, 122225-126876, 
127033-157212, 157808-159000, 159001-160000, 160001-170000, 170001-180000, 180001- 
190000, 190001-200000, 200001-210000, 210001-220000, 220001-230000, 230001-240825. It 
should be noted that nucleic acid fragments of any size and sequence may also be comprised by the 

25 polynucleotides described in this section. 

The PG-3 genomic nucleic acid comprises 14 exons. The exon positions in SEQ ID No 1 
are detailed below in Table A. 



Table A 



Exon 


Position in SEQ ID No 1 


Intron 


Position in SEQ ID No 1 


Beginning 


End 


Beginning 


End 


A 


2001 


2079 


A-B 


2080 


4626 


B 


4627 


4718 


B-C 


4719 


10114 


C 


ions 


10233 


C-D 


10234 


26809 


D 


26810 


26897 


D-E 


26898 


31356 


E 


31357 


31471 


E-F 


31472 


34260 


F 


34261 


34404 


F-S 


34405 


37376 


S 


37377 


37466 


S-T 


37467 


39703 


T 


39704 


40858 


T-G 


40859 


50435 


G 


50436 


50545 


G-H 


50546 


72880 


H 


72881 


72918 


HI 


72919 


75988 


I 


75989 


76151 


I-J 


76152 


95110 
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J 


95111 


9518S 


J-K 


95189 


216014 


K 


216015 


216252 


K-L 


216253 


237525 


L 


237526 


238825 









Thus, the invention embodies purified, isolated, or recombinant polynucleotides comprising 
a nucleotide sequence selected from the group consisting of the 14 exons of the PG-3 gene, or a 
sequence complementary thereto. The invention also relates to purified, isolated, or recombinant 
nucleic acids comprising a combination of at least two exons of the PG-3 gene, wherein the 
polynucleotides are arranged within the nucleic acid, from the 5 f -end to the 3 ! -end of said nucleic 
acid, in the same order as in SEQ ID No 1 . 

Intron A-B refers to the nucleotide sequence located between Exon A and Exon B, and so 
on. The position of the introns is detailed in Table A. The intron J-K is large. Indeed, it is 120 kb in 
length and comprises the whole angiopoietine gene. 

Thus, the invention embodies purified, isolated, or recombinant polynucleotides comprising 
a nucleotide sequence selected from the group consisting of the 13 introns of the PG-3 gene, or a 
sequence complementary thereto. 

While this section is entitled "Genomic Sequences of PG-3," it should be noted that nucleic 
acid fragments of any size and sequence may also be comprised by the polynucleotides described in 
this section, flanking the genomic sequences of PG-3 on either side or between two or more such 
genomic sequences. 

PG-3 CDNA SEQUENCES 

The expression of the PG-3 gene has been shown to lead to the production of at least one 
mRNA species which nucleic acid sequence is set forth in SEQ ID No 2. Three cDNAs have been 
independently cloned. They all have the same size but exhibit strong polymorphism between each 
other and between each cDNA and the genomic seqeunce. These polymorphisms are indicated in 
the appended sequence listing by the use of the feature "variation" in SEQ ID No 2. 

Another object of the invention is a purified, isolated, or recombinant nucleic acid 
comprising the nucleotide sequence of SEQ ID No 2, complementary sequences thereto, as well as 
allelic variants, and fragments thereof. Moreover, preferred polynucleotides of the invention 
include purified, isolated, or recombinant PG-3 cDNAs consisting of, consisting essentially of, or 
comprising the sequence of SEQ ID No 2. Particularly preferred nucleic acids of the invention 
include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 
12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID 
No 2 or the complements thereof. Additional preferred embodiments of the invention include 
isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 
18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 
or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the 
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following nucleotide positions of SEQ ID No 2: 1-500, 501-1000, 1001-1500, 1501-2000, 2001- 

2500, 2501-3000, 3001-3500, 3501-3809. 

The invention also pertains to a purified or isolated nucleic acid comprising a 

polynucleotide having at least 80, 85, 90, or 95% nucleotide identity with a polynucleotide of SEQ 
5 ID No 2, advantageously 99 % nucleotide identity, preferably 99.5% nucleotide identity and most 

preferably 99.8% nucleotide identity with a polynucleotide of SEQ ID No 2, or a sequence 

complementary thereto or a biologically active fragment thereof. 

Another object of the invention relates to purified, isolated or recombinant nucleic acids 

comprising a polynucleotide that hybridizes, under the stringent hybridization conditions defined 
10 herein, with a polynucleotide of SEQ ID No 2, or a sequence complementary thereto or a variant 

thereof or a biologically active fragment thereof. 

The cDNA of SEQ ID No 2 includes a 5'-UTR region starting from the nucleotide at 

position 1 and ending at the nucleotide in position 57 of SEQ ID No 2. The cDNA of SEQ ID No 2 

includes a 3-UTR region starting from the nucleotide at position 2566 and ending at the nucleotide 
15 at position 3809 of SEQ ID No 2. The polyadenylation signal starts from the nucleotide at position 

3795 and ends at the nucleotide in position 3800 of SEQ ID No 2. 

Consequently, the invention concerns a purified, isolated, or recombinant nucleic acid 

comprising a nucleotide sequence of the 5TJTR of the PG-3 cDNA, a sequence complementary 

thereto, or an allelic variant thereof. The invention also concerns a purified, isolated, or 
20 recombinant nucleic acid comprising a nucleotide sequence of the 3TJTR of the PG-3 cDNA, a 

sequence complementary thereto, or an allelic variant thereof. 

While this section is entitled "PG-3 cDNA Sequences," it should be noted that nucleic acid 

fragments of any size and sequence may also be comprised by the polynucleotides described in this 

section, flanking the PG-3 sequences on either side or between two or more such PG-3 sequences. 
25 CODING REGIONS 

The PG-3 open reading frame is contained in the corresponding mRNA of SEQ ID No 2. 

More precisely, the effective PG-3 coding sequence (CDS) includes the region between nucleotide 

position 58 (first nucleotide of the ATG codon) and nucleotide position 2565 (end nucleotide of the 

TGA codon) of SEQ ID No 2. 
30 The present invention also embodies isolated, purified, and recombinant polynucleotides 

which encode a polypeptide comprising a contiguous span of at least 6 amino acids, preferably at 

least 8 or 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of 

SEQ ID No 3. Preferably, the present invention also embodies isolated, purified, and recombinant 

polynucleotides which encode a polypeptide comprising a contiguous span of at least 6 amino acids, 
35 preferably at least 8 or 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 

amino acids of SEQ ID No 3, wherein wherein said contiguous span comprises at least 1, 2, 3, 5, or 
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10 of the following amino acid positions of SEQ ID No 3: 1-100, 101-200, 201-300, 301-400, 401- 
500, 501-600, 601-700, 701-835. 

The above disclosed polynucleotide that contains the coding sequence of the PG-3 gene 
may be expressed in a desired host cell or a desired host organism, when this polynucleotide is 
5 placed under the control of suitable expression signals. The expression signals may be either the 
expression signals contained in the regulatory regions in the PG-3 gene of the invention or in 
contrast the signals may be exogenous regulatory nucleic sequences. Such a polynucleotide, when 
placed under the suitable expression signals, may also be inserted in a vector for its expression 
and/or amplification. 
10 REGULATORY SEQUENCES OF PG-3 

As mentioned, the genomic sequence of the PG-3 gene contains regulatory sequences both 
in the non-transcribed 5-flanking region and in the non-transcribed 3 -flanking region that border 
the PG-3 coding region containing the 14 exons of this gene. 

The 5 1 regulatory region of the PG-3 gene is localized between the nucleotide in position 1 
15 and the nucleotide in position 2000 of the nucleotide sequence of SEQ ID No 1 . The 3* regulatory 
region of the PG-3 gene is localized between nucleotide position 238826 and nucleotide position 
240825 of SEQ ID No 1. 

Polynucleotides derived from the 5* and 3* regulatory regions are useful in order to detect 
the presence of at least a copy of a nucleotide sequence of SEQ ID No 1 or a fragment thereof in a 
20 test sample. 

The promoter activity of the 5' regulatory regions contained in PG-3 can be assessed as 
described below. 

In order to identify the relevant biologically active polynucleotide fragments or variants of 
SEQ ID No 1, one of skill in the art will refer to the book of Sambrook et ai (1989) which describes 

25 the use of a recombinant vector carrying a marker gene (Le. beta galactosidase, chloramphenicol 
acetyl transferase, etc.) the expression of which will be detected when placed under the control of a 
biologically active polynucleotide fragments or variants of SEQ ID No 1 . Genomic sequences 
located upstream of the first exon of the PG-3 gene are cloned into a suitable promoter reporter 
vector, such as the pSEAP-Basic, pSEAP-Enhancer, ppgal-Basic, pPgal-Enhancer, or pEGFP-1 

30 Promoter Reporter vectors available from Clontech, or pGL2-basic or pGL3-basic promoterless 
luciferase reporter gene vector from Promega. Briefly, each of these promoter reporter vectors 
include multiple cloning sites positioned upstream of a reporter gene encoding a readily assayable 
protein such as secreted alkaline phosphatase, luciferase, P galactosidase, or green fluorescent 
protein. The sequences upstream the PG-3 coding region are inserted into the cloning sites 

35 upstream of the reporter gene in both orientations and introduced into an appropriate host cell. The 
level of reporter protein is assayed and compared to the level obtained from a vector which lacks an 
insert in the cloning site. The presence of an elevated expression level in the vector containing the 
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insert with respect to the control vector indicates the presence of a promoter in the insert. If 
necessary, the upstream sequences can be cloned into vectors which contain an enhancer for 
increasing transcription levels from weak promoter sequences. A significant level of expression 
above that observed with the vector lacking an insert indicates that a promoter sequence is present 
5 in the inserted upstream sequence. 

Promoter sequences within the upstream genomic DNA may be further defined by 
constructing nested 5' and/or 3* deletions in the upstream DNA using conventional techniques such 
as Exonuclease III or appropriate restriction endonuclease digestion. The resulting deletion 
fragments can be inserted into the promoter reporter vector to determine whether the deletion has 

10 reduced or obliterated promoter activity, such as described, for example, by Coles et al (1998). In 
this way, the boundaries of the promoters may be defined. If desired, potential individual 
regulatory sites within the promoter may be identified using site directed mutagenesis or linker 
scanning to obliterate potential transcription factor binding sites within the promoter individually or 
in combination. The effects of these mutations on transcription levels may be determined by 

1 5 inserting the mutations into cloning sites in promoter reporter vectors. This type of assay is well- 
known to those skilled in the art and is described in WO 97/17359, US Patent No. 5,374,544; EP 
582 796; US Patent No. 5,698,389; US 5,643,746; US Patent No. 5,502,176; and US Patent 
5,266,488. 

The strength and the specificity of the promoter of the PG-3 gene can be assessed through 
20 the expression levels of a detectable polynucleotide operably linked to the PG-3 promoter in 
different types of cells and tissues. The detectable polynucleotide may be either a polynucleotide 
that specifically hybridizes with a predefined oligonucleotide probe, or a polynucleotide encoding a 
detectable protein, including a PG-3 polypeptide or a fragment or a variant thereof. This type of 
assay is well-known to those skilled in the art and is described in US Patent No. 5,502,176; and US 
25 Patent No. 5,266,488. Some of the methods are discussed in more detail below. 

Polynucleotides carrying the regulatory elements located at the 5' end and at the 3* end of 
the PG-3 coding region may be advantageously used to control the transcriptional and translational 
activity of an heterologous polynucleotide of interest. 

Thus, the present invention also concerns a purified or isolated nucleic acid comprising a 
30 polynucleotide which is selected from the group consisting of the 5' and 3' regulatory regions, or a 
sequence complementary thereto or a biologically active fragment or variant thereof. 

The invention also pertains to a purified or isolated nucleic acid comprising a 
polynucleotide having at least 80, 85, 90, or 95% nucleotide identity with a polynucleotide selected 
from the group consisting of the 5' and 3' regulatory regions, advantageously 99 % nucleotide 
35 identity, preferably 99.5% nucleotide identity and most preferably 99.8% nucleotide identity with a 
polynucleotide selected from the group consisting of the 5' and 3* regulatory regions, or a sequence 
complementary thereto or a variant thereof or a biologically active fragment thereof. 
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Another object of the invention relates to purified, isolated or recombinant nucleic acids 
comprising a polynucleotide that hybridizes, under the stringent hybridization conditions defined 
herein, with a polynucleotide selected from the group consisting of the nucleotide sequences of the 
5'- and 3' regulatory regions, or a sequence complementary thereto or a variant thereof or a 
5 biologically active fragment thereof. 

Preferred fragments of the 5* regulatory region have a length of about 1500 or 1000 
nucleotides, preferably of about 500 nucleotides, more preferably about 400 nucleotides, even more 
preferably 300 nucleotides and most preferably about 200 nucleotides. 

Preferred fragments of the 3' regulatory region are at least 50, 100, 150, 200, 300 or 400 
10 bases in length. 

"Biologically active" polynucleotide derivatives of SEQ ID No 1 are polynucleotides 
comprising or alternatively consisting essentially of or consisting of a fragment of said 
polynucleotide which is functional as a regulatory region for expressing a recombinant polypeptide 
or a recombinant polynucleotide in a recombinant cell host. It could act either as an enhancer or as 
15 a repressor. 

For the purpose of the invention, a nucleic acid or polynucleotide is "functional" as a 
regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide if said 
regulatory polynucleotide contains nucleotide sequences which contain transcriptional and 
translational regulatory information, and such sequences are "operably linked" to nucleotide 
20 sequences which encode the desired polypeptide or the desired polynucleotide. 

The regulatory polynucleotides of the invention may be prepared from the nucleotide 
sequence of SEQ ID No 1 by cleavage using suitable restriction enzymes, as described for example 
in the book of Sambrook et al. (1989). The regulatory polynucleotides may also be prepared by 
digestion of SEQ ID No 1 by an exonuclease enzyme, such as Bal3 1 (Wabiko et al , 1 986). These 
25 regulatory polynucleotides can also be prepared by nucleic acid chemical synthesis, as described 
elsewhere in the specification. 

The regulatory polynucleotides according to the invention may be part of a recombinant 
expression vector that may be used to express a coding sequence in a desired host cell or host 
organism. The recombinant expression vectors according to the invention are described elsewhere 
30 in the specification. 

A preferred S'-regulatory polynucleotide of the invention includes the 5 -untranslated region 
(5*-UTR) of the PG-3 cDNA, or a biologically active fragment or variant thereof. 

A preferred 3 '-regulatory polynucleotide of the invention includes the 3 '-untranslated region 
(3-UTR) of the PG-3 cDNA, or a biologically active fragment or variant thereof. 
35 A further object of the invention relates to a purified or isolated nucleic acid comprising: 

a) a nucleic acid comprising a regulatory nucleotide sequence selected from the 
group consisting of: 
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(i) a nucleotide sequence comprising a polynucleotide of the 5* regulatory 
region or a complementary sequence thereto; or 

(ii) a nucleotide sequence comprising a polynucleotide having at least 80, 
85, 90, or 95% of nucleotide identity with the nucleotide sequence of the 5* 

5 regulatory region or a complementary sequence thereto; or 

(iii) a nucleotide sequence comprising a polynucleotide that hybridizes 
under stringent hybridization conditions with the nucleotide sequence of the 5* 
regulatory region or a complementary sequence thereto; or 

(iv) a biologically active fragment or variant of the polynucleotides in (i), 
10 (ii) and (iii); 

b) a polynucleotide encoding a desired polypeptide or a nucleic acid of interest, 
operably linked to the nucleic acid defined in (a) above; 

c) Optionally, a nucleic acid comprising a 3 - regulatory polynucleotide, preferably 
a 3 1 - regulatory polynucleotide of the PG-3 gene. 

15 In a specific embodiment of the nucleic acid defined above, said nucleic acid includes the 

5-untranslated region (5'-UTR) of the PG-3 cDNA, or a biologically active fragment or variant 
thereof. 

In a second specific embodiment of the nucleic acid defined above, said nucleic acid 
includes the 3 , -untranslated region (3 f -UTR) of the PG-3 cDNA, or a biologically active fragment or 
20 variant thereof. 

The regulatory polynucleotide of the 5' regulatory region, or its biologically active 
fragments or variants, is operably linked at the 5'-end of the polynucleotide encoding the desired 
polypeptide or polynucleotide. 

The regulatory polynucleotide of the 3 r regulatory region, or its biologically active 
25 fragments or variants, is advantageously operably linked at the 3'-end of the polynucleotide 
encoding the desired polypeptide or polynucleotide. 

The desired polypeptide encoded by the above-described nucleic acid may be of various 
nature or origin, encompassing proteins of prokaryotic or eukaryotic origin. Among the 
polypeptides which may be expressed under the control of a PG-3 regulatory region are bacterial, 
30 fungal or viral antigens. Also encompassed are eukaryotic proteins such as intracellular proteins, 
like "house keeping" proteins, membrane-bound proteins, like receptors, and secreted proteins like 
endogenous mediators such as cytokines. The desired polypeptide may be the PG-3 protein, 
especially the protein of the amino acid sequence of SEQ ID No 3, or a fragment or a variant 
thereof. 

35 The desired nucleic acids encoded by the above-described polynucleotide, usually an RNA 

molecule, may be complementary to a desired coding p lynucleotide, for example to the PG-3 
coding sequence, and thus useful as an antisense polynucleotide. 
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Such a polynucleotide may be included in a recombinant expression vector in order to 
express the desired polypeptide or the desired nucleic acid in host cell or in a host organism. 
Suitable recombinant vectors that contain a polynucleotide such as described herein are disclosed 
elsewhere in the specification. 
5 POLYNUCLEOTIDE CONSTRUCTS 

The terms "polynucleotide construct" and "recombinant polynucleotide" are used 
interchangeably herein to refer to linear or circular, purified or isolated polynucleotides that have 
been artificially designed and which comprise at least two nucleotide sequences that are not found 
as contiguous nucleotide sequences in their initial natural environment. 

10 DNA Construct That Enables Temporal And Spatial PG-3 Gene Expression In 

Recombinant Cell Hosts And In Transgenic Animals. 

In order to study the physiological and phenotypic consequences of a lack of synthesis of 
the PG-3 protein, both at the cell level and at the multi cellular organism level, the invention also 
encompasses DNA constructs and recombinant vectors enabling a conditional expression of a 

15 specific allele of the PG-3 genomic sequence or cDNA and also of a copy of this genomic sequence 
or cDNA harboring substitutions, deletions, or additions of one or more bases as regards to the PG- 
3 nucleotide sequence of SEQ ID Nos 1 and 2, or a fragment thereof, these base substitutions, 
deletions or additions being located either in an exon, an intron or a regulatory sequence, but 
preferably in the 5-regulatory sequence or in an exon of the PG-3 genomic sequence or within the 

20 PG-3 cDNA of SEQ ID No 2. In a preferred embodiment, the PG-3 sequence comprises a biallelic 
marker of the present invention. In a preferred embodiment, the PG-3 sequence comprises at least 
one of the biallelic markers Al to A80. 

The present invention embodies recombinant vectors comprising any one of the 
polynucleotides described in the present invention. More particularly, the polynucleotide constructs 

25 according to the present invention can comprise any of the polynucleotides described in the 
"Genomic Sequences Of The PG3 Gene" section, the "PG-3 cDNA Sequences" section, the 
"Coding Regions" section, and the "Oligonucleotide Probes And Primers" section. 

A first preferred DNA construct is based on the tetracycline resistance operon tet from £. 
coli transposon TnlO for controlling the PG-3 gene expression, such as described by Gossen et 

30 a/.(1992, 1995) and Furth et a/.(1994). Such a DNA construct contains seven tet operator 

sequences from TnlO (/etop) that are fused to either a minimal promoter or a 5-regulatory sequence 
of the PG-3 gene, said minimal promoter or said PG-3 regulatory sequence being operably linked to 
a polynucleotide of interest that codes either for a sense or an antisense oligonucleotide or for a 
polypeptide, including a PG-3 polypeptide or a peptide fragment thereof. This DNA construct is 

35 functional as a conditional expression system for the nucleotide sequence of interest when the same 
cell also comprises a nucleotide sequence coding for either the wild type (tTA) or the mutant (rTA) 
repressor fused to the activating domain of viral protein VP 16 of herpes simplex virus, placed 
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under the control of a promoter, such as the HCMVIE1 enhancer/promoter or the MMTV-LTR. 
Indeed, a preferred DNA construct of the invention comprises both the polynucleotide containing 
the tet operator sequences and the polynucleotide containing a sequence coding for the tTA or the 
rTA repressor. 

5 In a specific embodiment, the conditional expression DNA construct contains the sequence 

encoding the mutant tetracycline repressor rTA, the expression of the polynucleotide of interest is 
silent in the absence of tetracycline and induced in its presence. 

DNA Constructs Allowing Homologous Recombination: Replacement Vectors 
A second preferred DNA construct comprises, from 5'-end to 3'-end: (a) a first nucleotide 
10 sequence that is included within the PG-3 genomic sequence; (b) a nucleotide sequence comprising 
a positive selection marker, such as the marker for neomycine resistance (neo); and (c) a second 
nucleotide sequence that is included within the PG-3 genomic sequence, and is located on the 
genome downstream the first PG-3 nucleotide sequence (a). 

In a preferred embodiment, this DNA construct also comprises a negative selection marker 
15 located upstream of the nucleotide sequence (a) or downstream from the nucleotide sequence (c). 
Preferably, the negative selection marker comprises of the thymidine kinase (tk) gene (Thomas et 
ai, 1986), the hygromycine beta gene (Te Riele et al y 1990), the hprt gene (Van der Lugt et al 9 
1991; Reid et al, 1990) or the Diphteria toxin A fragment (Dt-A) gene (Nada et aL 9 1993; Yagi et 
al 1990). Preferably, the positive selection marker is located within a PG-3 exon sequence so as to 
20 interrupt the sequence encoding a PG-3 protein. These replacement vectors are described, for 
example, by Thomas et a/.(1986; 1987), Mansour et a/.(1988) and Roller et a/.(1992). 

The first and second nucleotide sequences (a) and (c) may be indifferently located within a 
PG-3 regulatory sequence, an intronic sequence, an exon sequence or a sequence containing both 
regulatory and/or intronic and/or exon sequences. The size of the nucleotide sequences (a) and (c) 
25 ranges from 1 to 50 kb, preferably from 1 to 10 kb, more preferably from 2 to 6 kb and most 
preferably from 2 to 4 kb. 

DNA Constructs Allowing Homologous Recombination: Cre-LoxP System 
These new DNA constructs make use of the site specific recombination system of the PI 
phage. The PI phage possesses a recombinase called Cre which interacts specifically with a 34 
30 base pairs loxF site. The lox? site is composed of two palindromic sequences of 13 bp separated by 
a 8 bp conserved sequence (Hoess et al> 1986). The recombination by the Cre enzyme between 
two lox? sites having an identical orientation leads to the deletion of the DNA fragment. 

The Cre-tacP system used in combination with a homologous recombination technique has 
been first described by Gu et a/.(1993, 1994). Briefly, a nucleotide sequence of interest to be 
35 inserted in a targeted location of the genome harbors at least two lox? sites in the same orientation 
and located at the respective ends of a nucleotide sequence to be excised from the recombinant 
genome. The excision event requires the presence of the recombinase (Cre) enzyme within the 
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nucleus of the recombinant cell host. The recombinase enzyme may be provided at the desired time 
either by (a) incubating the recombinant cell hosts in a culture medium containing this enzyme, by 
injecting the Cre enzyme directly into the desired cell, such as described by Araki et a/.(1995), or 
by lipofection of the enzyme into the cells, such as described by Baubonis et ai (1993); (b) 
5 transfecting the cell host with a vector comprising the Cre coding sequence operably linked to a 
promoter functional in the recombinant host cell, said promoter being optionally inducible, said 
vector being introduced in the recombinant cell host, such as described by Gu et al. (1993) and 
Sauer et a/.(1988); (c) introducing in the genome of the cell host a polynucleotide comprising the 
Cre coding sequence operably linked to a promoter functional in the recombinant cell host, which 
10 promoter is optionally inducible, and said polynucleotide being inserted in the genome of the cell 
host either by a random insertion event or an homologous recombination event, such as described 
by Gue/a/.(1994). 

In a specific embodiment, the vector containing the sequence to be inserted in the PG-3 
gene by homologous recombination is constructed in such a way that selectable markers are flanked 

1 5 by lox? sites of the same orientation, it is possible, by treatment by the Cre enzyme, to eliminate the 
selectable markers while leaving the PG-3 sequences of interest that have been inserted by an 
homologous recombination event. Again, two selectable markers are needed: a positive selection 
marker to select for the recombination event and a negative selection marker to select for the 
homologous recombination event. Vectors and methods using the Cre-/ojcP system are described by 

20 Zoue/a/.(1994). 

Thus, a third preferred DNA construct of the invention comprises, from 5'-end to 3'-end: (a) 
a first nucleotide sequence that is included in the PG-3 genomic sequence; (b) a nucleotide sequence 
comprising a polynucleotide encoding a positive selection marker, said nucleotide sequence 
comprising additionally two sequences defining a site recognized by a recombinase, such as a loxP 

25 site, the two sites being placed in the same orientation; and (c) a second nucleotide sequence that is 
included in the PG-3 genomic sequence, and is located on the genome downstream of the first PG-3 
nucleotide sequence (a). 

The sequences defining a site recognized by a recombinase, such as a loxP site, are 
preferably located within the nucleotide sequence (b) at suitable locations bordering the nucleotide 

30 sequence for which the conditional excision is sought. In one specific embodiment, two loxP sites 
are located at each side of the positive selection marker sequence, in order to allow its excision at a 
desired time after the occurrence of the homologous recombination event. 

In a preferred embodiment of a method using the third DNA construct described above, the 
excision of the polynucleotide fragment bordered by the two sites recognized by a recombinase, 

35 preferably two loxP sites, is performed at a desired time, due to the presence within the genome of 
the recombinant host cell of a sequence encoding the Cre enzyme operably linked to a promoter 
sequence, preferably an inducible promoter, more preferably a tissue-specific prom ter sequence 



WO 01/14550 PCT/IB00/01098 

29 

and most preferably a promoter sequence which is both inducible and tissue-specific, such as 
described by Gu et al. (1994). 

The presence of the Cre enzyme within the genome of the recombinant cell host may result 
from the breeding of two transgenic animals, the first transgenic animal bearing the PG-3-derived 
5 sequence of interest containing the lox? sites as described above and the second transgenic animal 
bearing the Cre coding sequence operably linked to a suitable promoter sequence, such as described 
by Gu et a/.(1994). 

Spatio-temporal control of the Cre enzyme expression may also be achieved with an 
adenovirus based vector that contains the Cre gene thus allowing infection of cells, or in vivo 

10 infection of organs, for delivery of the Cre enzyme, such as described by Anton et ai (1995) and 
Kanegae et a/.(1995). 

The DNA constructs described above may be used to introduce a desired nucleotide 
sequence of the invention, preferably a PG-3 genomic sequence or a PG-3 cDNA sequence, and 
most preferably an altered copy of a PG-3 genomic or cDNA sequence, within a predetermined 

15 location of the targeted genome, leading either to the generation of an altered copy of a targeted 
gene (knock-out homologous recombination) or to the replacement of a copy of the targeted gene by 
another copy sufficiently homologous to allow an homologous recombination event to occur 
(knock-in homologous recombination). In a specific embodiment, the DNA constructs described 
above may be used to introduce a PG-3 genomic sequence or a PG-3 cDNA sequence comprising at 

20 least one biallelic marker of the present invention, preferably at least one biallelic marker selected 
from the group consisting of A 1 to A80. 

Nuclear Antisense DNA Constructs 

Other compositions comprise a vector of the invention comprising an oligonucleotide 
fragment of the nucleic acid sequence of SEQ ID No 2, preferably a fragment including the start 
25 codon of the PG-3 gene, as an antisense tool that inhibits the expression of the corresponding PG-3 
gene. Preferred methods using antisense polynucleotide according to the present invention are the 
procedures described by Sczakiel et al (1995) or those described in PCT Application No WO 
95/24223. 

Preferably, the antisense tools are chosen among the polynucleotides (1 5-200 bp long) that 
30 are complementary to the 5'end of the PG-3 mRNA. In one embodiment, a combination of different 
antisense polynucleotides complementary to different parts of the desired targeted gene are used. 

Prefeired antisense polynucleotides according to the present invention are complementary 
to a sequence of the mRNAs of PG-3 that contains either the translation initiation codon ATG or a 
splicing site. Further preferred antisense polynucleotides according to the invention are 
35 complementary of the splicing site of the PG-3 mRNA. 

Preferably, the antisense polynucleotides of the invention have a 3* polyadenylation signal 
that has been replaced with a self-cleaving ribozyme sequence, such that RNA polymerase II 
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transcripts are produced without poIy(A) at their 3* ends, these antisense polynucleotides being 
incapable of export from the nucleus, such as described by Liu et ah (1994). In a preferred 
embodiment, these PG-3 antisense polynucleotides also comprise, within the ribozyme cassette, a 
histone stem-loop structure to stabilize cleaved transcripts against 3'-5' exonucleolytic degradation, 
5 such as the structure described by Eckner et a/.( 1991). 

Oligonucleotide Probes And Primers 
Polynucleotides derived from the PG-3 gene are useful in order to detect the presence of at 
least a copy of a nucleotide sequence of SEQ ID No I, or a fragment, complement, or variant 
thereof in a test sample. 

10 Particularly preferred probes and primers of the invention include isolated, purified, or 

recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 1: 1-97921,98517-103471, 103603-108222, 108390-109221, 109324- 

15 1 14409, 1 14538-1 15723, 1 15957-122102, 122225-126876, 127033-157212, 157808-240825. 
Additional preferred probes and primers of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 
80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements thereof, 
wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 

20 positions of SEQ ID No 1: 1-10000, 10001-20000, 20001-30000, 30001-40000, 40001-50000, 
50001-60000, 60001-70000, 70001-80000, 80001-90000, 90001-97921, 98517-103471, 103603- 
108222, 108390-109221, 109324-114409, 114538-115723, 115957-122102, 122225-126876, 
127033-157212, 157808-159000, 159001-160000, 160001-170000, 170001-180000, 180001- 
190000, 190001-200000, 200001-210000, 210001-220000, 220001-230000, 230001-240825. 

25 Another object of the invention is a purified, isolated, or recombinant nucleic acid 

comprising the nucleotide sequence of SEQ ID No 2, complementary sequences thereto, as well as 
allelic variants, and fragments thereof. Moreover, preferred probes and primers of the invention 
include purified, isolated, or recombinant PG-3 cDNAs consisting of, consisting essentially of, or 
comprising the sequence of SEQ ID No 2. Particularly preferred probes and primers of the 

30 invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span 
of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides 
of SEQ ID No 2 or the complements thereof. Additional preferred embodiments of the invention 
include probes and primers comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the complements 

35 thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 2: 1-500, 501-1000, 1001-1500, 1501-2000, 2001-2500, 2501-3000, 3001- 
3500, 3501-3809. 
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Thus, the invention also relates to nucleic acid probes characterized in that they hybridize 
specifically, under the stringent hybridization conditions defined above, with a nucleic acid selected 
from the group consisting of the nucleotide sequences 1-97921, 98517-103471, 103603-108222, 
108390-109221, 109324-114409, 114538-115723, 115957-122102, 122225-126876, 127033- 
5 1 572 1 2, 1 57808-240825 of SEQ ID No 1 or a variant thereof or a sequence complementary thereto. 
The invention relates to nucleic acid probes characterized in that they hybridize specifically, under 
the stringent hybridization conditions defined above, with a nucleic acid of SEQ ID No 2 or a 
variant or a fragment thereof or a sequence complementary thereto. 

In one embodiment the invention encompasses isolated, purified, and recombinant 

10 polynucleotides consisting of, or consisting essentially of a contiguous span of at least 8, 10, 12, 15, 
18, 20, 25, 30, 35, 40, or 50 nucleotides in length of any one of SEQ ID Nos 1 and 2 and the 
complement thereof, wherein said span includes a PG-3-related biallelic marker in said sequence; 
optionally, said PG-3-related biallelic marker is selected from the group consisting of A 1 to A80, 
and the complements thereof, or optionally the biallelic markers in linkage disequilibrium 

15 therewith; optionally, wherein said PG-3-related biallelic marker is selected from the group 
consisting of A 1 to A5 and A8 to A80, and the complements thereof, or optionally the biallelic 
markers in linkage disequilibrium therewith; optionally, wherein said PG-3-related biallelic marker 
is selected from the group consisting of A6 and A7, and the complements thereof, or optionally the 
biallelic markers in linkage disequilibrium therewith; optionally, said contiguous span is 18 to 35 

20 nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said 

polynucleotide; optionally, said polynucleotide comprises, consists essentially of, or consists of 
said contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker 
is at the center of said polynucleotide; optionally, the 3* end of said contiguous span is present at 
the 3' end of said polynucleotide; and optionally, the 3* end of said contiguous span is located at 

25 the 3* end of said polynucleotide and said biallelic marker is present at the 3* end of said 
polynucleotide. In a preferred embodiment, said probes comprises, consists of, or consists 
essentially of a sequence selected from the following sequences: PI to P4 and P6 to P80 and the 
complementary sequences thereto. 

In another embodiment the invention encompasses isolated, purified or recombinant 

30 polynucleotides comprising, consisting of, or consisting essentially of a contiguous span of at least 
8, 10, 12, 15, 18, 20, 25, 30, 35, 40, or 50 nucleotides in length of SEQ ID Nos 1 and 2, or the 
complements thereof, wherein the 3* end of said contiguous span is located at the 3* end of said 
polynucleotide, and wherein the 3* end of said polynucleotide is located within 20 nucleotides 
upstream of a PG-3-related biallelic marker in said sequence; optionally, wherein said PG-3-related 

35 biallelic marker is selected from the group consisting of A 1 to A80, and the complements thereof, 
or optionally the biallelic markers in linkage disequilibrium therewith; optionally, wherein said PG- 
3-related biallelic marker is selected from the group consisting of Al to A5 and A8 to A80, and the 
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complements thereof, or optionally the biallelic markers in linkage disequilibrium therewith; 
optionally, wherein said PG-3-related biallelic marker is selected from the group consisting of A6 
and A7, and the complements thereof, or optionally the biallelic markers in linkage disequilibrium 
therewith; optionally, wherein the 3' end of said polynucleotide is located 1 nucleotide upstream of 
5 said PG-3-related biallelic marker in said sequence; and optionally, wherein said polynucleotide 
consists essentially of a sequence selected from the following sequences: Dl to D4, D6 to D80, El 
to E4 and E6 to E80. 

In a further embodiment, the invention encompasses isolated, purified, or recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a sequence selected from the 

10 following sequences: Bl to B52 and CI to C52. 

In an additional embodiment, the invention encompasses polynucleotides for use in 
hybridization assays, sequencing assays, and enzyme-based mismatch detection assays for 
determining the identity of the nucleotide at a PG-3-related biallelic marker in SEQ ID Nos 1 and 2, 
as well as polynucleotides for use in amplifying segments of nucleotides comprising a PG-3-related 

15 biallelic marker in SEQ ID Nos 1 and 2; optionally, wherein said PG-3-related biallelic marker is 
selected from the group consisting of Al to A80, and the complements thereof, or optionally the 
biallelic markers in linkage disequilibrium therewith; optionally, wherein said PG-3-related 
biallelic marker is selected from the group consisting of Al to A5 and A8 to A80, and the 
complements thereof, or optionally the biallelic markers in linkage disequilibrium therewith; 

20 optionally, wherein said PG-3-related biallelic marker is selected from the group consisting of A6 
and A7, and the complements thereof, or optionally the biallelic markers in linkage disequilibrium 
therewith. 

The invention concerns the use of the polynucleotides according to the invention for 
determining the identity of the nucleotide at a PG-3-related biallelic marker, preferably in 
25 hybridization assay, sequencing assay, microsequencing assay, or an enzyme-based mismatch 
detection assay and in amplifying segments of nucleotides comprising a PG-3-related biallelic 
marker. 

A probe or a primer according to the invention is between 8 and 1 000 nucleotides in length, 
or is specified to be at least 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 

30 nucleotides in length. More particularly, the length of these probes and primers can range from 8, 
10, 15, 20, or 30 to 100 nucleotides, preferably from 10 to 50, more preferably from 15 to 30 
nucleotides. Shorter probes and primers tend to lack specificity for a target nucleic acid sequence 
and generally require cooler temperatures to form sufficiendy stable hybrid complexes with the 
template. Longer probes and primers are expensive to produce and can sometimes self-hybridize to 

35 form hairpin structures. The appropriate length for primers and probes under a particular set of 
assay conditions may be empirically determined by one of skill in the art. A preferred probe or 
primer consists of a nucleic acid comprising a polynucleotide selected from the group of the 
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nucleotide sequences of PI to P4 and P6 to P80 and the complementary sequence thereto, Bl to 
B52, CI to C52, Dl to D4, D6 to D80, El to E4 and E6 to E80, for which the respective locations in 
the sequence listing are provided in Tables 1, 2, and 3. 

The formation of stable hybrids depends on the melting temperature (Tm) of the DNA. The 

5 Tm depends on the length of the primer or probe, the ionic strength of the solution and the G+C 
content. The higher the G+C content of the primer or probe, the higher is the melting temperature 
because G:C pairs are held by three H bonds whereas A:T pairs have only two. The GC content in 
the probes of the invention usually ranges between 10 and 75 %, preferably between 35 and 60 %, 
and more preferably between 40 and 55 %. 

10 The primers and probes can be prepared by any suitable method, including, for example, 

cloning and restriction of appropriate sequences and direct chemical synthesis by a method such as 
the phosphodiester method of Narang et a/.(1979), the phosphodiester method of Brown et 
a/.(1979), the diethylphosphoramidite method of Beaucage et a/.(1981) and the solid support 
method described in EP 0 707 592. 

1 5 Detection probes are generally nucleic acid sequences or uncharged nucleic acid analogs 

such as, for example peptide nucleic acids which are disclosed in International Patent Application 
WO 92/20702, morpholino analogs which are described in U.S. Patents Numbered 5,185,444; 
5,034,506 and 5,142,047. The probe may have to be rendered "non-extendable" in that additional 
dNTPs cannot be added to the probe. In and of themselves analogs usually are non-extendable and 

20 nucleic acid probes can be rendered non-extendable by modifying the 3* end of the probe such that 
the hydroxyl group is no longer capable of participating in elongation. For example, the 3' end of 
the probe can be functionalized with the capture or detection label to thereby consume or otherwise 
block the hydroxyl group. Alternatively, the 3* hydroxyl group simply can be cleaved, replaced or 
modified, U.S. Patent Application Serial No. 07/049,061 filed April 19, 1993 describes 

25 modifications, which can be used to render a probe non-extendable. 

Any of the polynucleotides of the present invention can be labeled, if desired, by 
incorporating any label known in the art to be detectable by spectroscopic, photochemical, 
biochemical, immunochemical, or chemical means. For example, useful labels include radioactive 
substances (including, 32 P, 35 S, 3 H, 125 I), fluorescent dyes (including, 5-bromodesoxyuridin, 

30 fluorescein, acetylaminofluorene, digoxigenin) or biotin. Preferably, polynucleotides are labeled at 
their 3* and 5 1 ends. Examples of non-radioactive labeling of nucleic acid fragments are described 
in the French patent No. FR-7810975, or by Urdea et al (1988) or Sanchez-Pescador et al (1988). In 
addition, the probes according to the present invention may have structural characteristics such that 
they allow the signal amplification, such structural characteristics being, for example, branched 

35 DNA probes as those described by Urdea et al in 1 991 or in the European patent No. EP 0 225 807 
(Chiron). 
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A label can also be used to capture the primer, so as to facilitate the immobilization of 
either the primer or a primer extension product, such as amplified DNA, on a solid support. A 
capture label is attached to the primers or probes and can be a specific binding member which forms 
a binding pair with the solid's phase reagent's specific binding member (e.g. biotin and 
5 streptavidin). Therefore depending upon the type of label carried by a polynucleotide or a probe, it 
may be employed to capture or to detect the target DNA. Further, it will be understood that the 
polynucleotides, primers or probes provided herein, may, themselves, serve as the capture label. 
For example, in the case where a solid phase reagent's binding member is a nucleic acid sequence, 
it may be selected such that it binds a complementary portion of a primer or probe to thereby 

10 immobilize the primer or probe to the solid phase. In cases where a polynucleotide probe itself 
serves as the binding member, those skilled in the art will recognize that the probe will contain a 
sequence or "tail" that is not complementary to the target. In the case where a polynucleotide 
primer itself serves as the capture label, at least a portion of the primer will be free to hybridize with 
a nucleic acid on a solid phase. DNA Labeling techniques are well known to the skilled technician. 

1 5 The probes of the present invention are useful for a number of purposes. They can be 

notably used in Southern hybridization to genomic DNA. The probes can also be used to detect 
PCR amplification products. They may also be used to detect mismatches in the PG-3 gene or 
mRNA using other techniques. 

Any of the polynucleotides, primers and probes of the present invention can be 

20 conveniently immobilized on a solid support. Solid supports are known to those skilled in the art 
and include the walls of wells of a reaction tray, test tubes, polystyrene beads, magnetic beads, 
nitrocellulose strips, membranes, microparticles such as latex particles, sheep (or other animal) red 
blood cells, duracytes and others. The solid support is not critical and can be selected by one skilled 
in the art. Thus, latex particles, microparticles, magnetic or non-magnetic beads, membranes, 

25 plastic tubes, walls of microtiter wells, glass or silicon chips, sheep (or other suitable animal's) red 
blood cells and duracytes are all suitable examples. Suitable methods for immobilizing nucleic 
acids on solid phases include ionic, hydrophobic, covalent interactions and the like. A solid 
support, as used herein, refers to any material which is insoluble, or can be made insoluble by a 
subsequent reaction. The solid support can be chosen for its intrinsic ability to attract and 

30 immobilize the capture reagent. Alternatively, the solid phase can retain an additional receptor 
which has the ability to attract and immobilize the capture reagent. The additional receptor can 
include a charged substance that is oppositely charged with respect to the capture reagent itself or to 
a charged substance conjugated to the capture reagent. As yet another alternative, the receptor 
molecule can be any specific binding member which is immobilized upon (attached to) the solid 

35 support and which has the ability to immobilize the capture reagent through a specific binding 
reaction. The receptor molecule enables the indirect binding of the capture reagent to a solid 
support material before the performance of the assay or during the performance of the assay. The 
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allowing the amplification of a nucleic acid containing the polymorphic base of one PG-3 biallelic 
marker are listed in Table 1 of Example 2. 

Eight PG-3-related biallelic markers A3, A6, A7, A14, A70, A71, A72 and A80, are located 
in the exonic regions of the genomic sequence of PG-3 at the following positions: 10228, 39944, 
5 39973, 76060, 216026, 216082, 216218 and 237555 of the SEQ ID No 1. They are located in exons 
C, T, I, K and L of the PG-3 gene. Their respective positions in the cDNA and protein sequences are 
given in Table 2. 

The invention also relates to a purified and/or isolated nucleotide sequence comprising a 
polymorphic base of a PG-3-related biallelic marker, preferably of a biallelic marker selected from 

10 the group consisting of Al to A80, and the complements thereof. The sequence is between 8 and 
1000 nucleotides in length, and preferably comprises at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 
60, 70, 80, 100, 250, 500 or 1000 contiguous nucleotides of a nucleotide sequence selected from the 
group consisting of SEQ ID Nos 1 and 2 or a variant thereof or a complementary sequence thereto. 
These nucleotide sequences comprise the polymorphic base of either allele 1 or allele 2 of the 

15 considered biallelic marker. Optionally, said biallelic marker may be within 6, 5, 4, 3, 2, or 1 
nucleotides of the center of said polynucleotide or at the center of said polynucleotide. Optionally, 
the 3* end of said contiguous span may be present at the 3' end of said polynucleotide. Optionally, 
biallelic marker may be present at the 3* end of said polynucleotide. Optionally, said polynucleotide 
may further comprise a label. Optionally, said polynucleotide can be attached to solid support. In a 

20 further embodiment, the polynucleotides defined above can be used alone or in any combination. 

The invention also relates to a purified and/or isolated nucleotide sequence comprising a 
sequence between 8 and 1000 nucleotides in length, and preferably at least 8, 10, 12, 15, 18, 20, 25, 
35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 contiguous nucleotides of a nucleotide sequence 
selected from the group consisting of SEQ ID Nos 1 and 2 or a variant thereof or a complementary 

25 sequence thereto. Optionally, the 3* end of said polynucleotide may be located within or at least 2, 
4, 6, 8, 10, 12, 15, 18, 20, 25, 50, 100, 250, 500, or 1000 nucleotides upstream of a PG-3-related 
biallelic marker in said sequence. Optionally, said PG-3-related biallelic marker is selected from 
the group consisting of Al to A80; Optionally, the 3 f end of said polynucleotide may be located 1 
nucleotide upstream of a PG-3-related biallelic marker in said sequence. Optionally, said 

30 polynucleotide may further comprise a label. Optionally, said polynucleotide can be attached to 
solid support. In a further embodiment, the polynucleotides defined above can be used alone or in 
any combination. 

In a preferred embodiment, the sequences comprising a polymorphic base of one of the 
biallelic markers listed in Table 2 are selected from the group consisting of the nucleotide sequences 
35 comprising, consisting essentially of, or consisting of the amplicons listed in Table 1 or a variant 
thereof or a complementary sequence thereto. 
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solid phase thus can be a plastic, derivatized plastic, magnetic or non-magnetic metal, glass or 
silicon surface of a test tube, microtiter well, sheet, bead, microparticle, chip, sheep (or other 
suitable animal's) red blood cells, duracytes® and other configurations known to those of ordinary 
skill in the art. The polynucleotides of the invention can be attached to or immobilized on a solid 
5 support individually or in groups of at least 2, 5, 8, 10, 12, 15, 20, or 25 distinct polynucleotides of 
the invention to a single solid support. In addition, polynucleotides other than those of the 
invention may be attached to the same solid support as one or more polynucleotides of the 
invention. 

Consequently, the invention also relates to a method for detecting the presence of a nucleic 
10 acid comprising a nucleotide sequence selected from the group consisting of SEQ ID Nos 1 and 2, a 
fragment or a variant thereof and a complementary sequence thereto in a sample, said method 
comprising the following steps of: 

a) bringing into contact a nucleic acid probe or a plurality of nucleic acid probes 
which can hybridize with a nucleotide sequence included in a nucleic acid selected from the 

15 group consisting of the nucleotide sequences of SEQ ID Nos 1 and 2, a fragment or a 

variant thereof and a complementary sequence thereto and the sample to be assayed; and 

b) detecting the hybrid complex formed between the probe and a nucleic acid in the 
sample. 

The invention further concerns a kit for detecting the presence of a nucleic acid comprising 
20 a nucleotide sequence selected from a group consisting of SEQ ID Nos 1 and 2, a fragment or a 
variant thereof and a complementary sequence thereto in a sample, said kit comprising: 

a) a nucleic acid probe or a plurality of nucleic acid probes which can hybridize 
with a nucleotide sequence included in a nucleic acid selected form the group consisting of 
the nucleotide sequences of SEQ ID Nos 1 and 2, a fragment or a variant thereof and a 

25 complementary sequence thereto; and 

b) optionally, the reagents necessary for performing the hybridization reaction. 

In a first preferred embodiment of this detection method and kit, said nucleic acid probe or 
the plurality of nucleic acid probes are labeled with a detectable molecule. In a second preferred 
embodiment of said method and kit, said nucleic acid probe or the plurality of nucleic acid probes 

30 has been immobilized on a substrate. In a third preferred embodiment, the nucleic acid probe or the 
plurality of nucleic acid probes comprise either a sequence which is selected from the group 
consisting of the nucleotide sequences of PI to P4 and P6 to P80 and the complementary sequence 
thereto, Bl to B52, CI to C52, Dl to D4, D6 to D80, El to E4 and E6 to E80 or a biallelic marker 
selected from the group consisting of Al to A80 and the complements thereto. 

35 Olig nucleotide Arrays 
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A substrate comprising a plurality of oligonucleotide primers or probes of the invention 
may be used either for detecting or amplifying targeted sequences in the PG-3 gene and may also be 
used for detecting mutations in the coding or in the non-coding sequences of the PG-3 gene. 

Any polynucleotide provided herein may be attached in overlapping areas or at random 
5 locations on the solid support. Alternatively, the polynucleotides of the invention may be attached 
in an ordered array wherein each polynucleotide is attached to a distinct region of the solid support 
which does not overlap with the attachment site of any other polynucleotide. Preferably, such an 
ordered array of polynucleotides is designed to be "addressable" where the distinct locations are 
recorded and can be accessed as part of an assay procedure. Addressable polynucleotide arrays 

10 typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a 
substrate in different known locations. The knowledge of the precise location of each 
polynucleotide makes these "addressable" arrays particularly useful in hybridization assays. Any 
addressable array technology known in the art can be employed with the polynucleotides of the 
invention. One particular embodiment of these polynucleotide arrays is known as the Genechips™, 

15 and has been generally described in US Patent 5,143,854; PCT publications WO 90/15070 and 
92/10092. These arrays may generally be produced using mechanical synthesis methods or light 
directed synthesis methods which incorporate a combination of photolithographic methods and solid 
phase oligonucleotide synthesis (Fodor et al. y 1991). The immobilization of arrays of 
oligonucleotides on solid supports has been rendered possible by the development of a technology 

20 generally identified as "Very Large Scale Immobilized Polymer Synthesis" (VLSIPS™) in which, 
typically, probes are immobilized in a high density array on a solid surface of a chip. Examples of 
VLSIPS™ technologies are provided in US Patents 5,143,854; and 5,412,087 and in PCT 
Publications WO 90/15070, WO 92/10092 and WO 95/1 1995, which describe methods for forming 
oligonucleotide arrays through techniques such as light-directed synthesis techniques. In designing 

25 strategies aimed at providing arrays of nucleotides immobilized on solid supports, further 

presentation strategies were developed to order and display the oligonucleotide arrays on the chips 
in an attempt to maximize hybridization patterns and sequence information. Examples of such 
presentation strategies are disclosed in PCT Publications WO 94/12305, WO 94/1 1530, WO 
97/29212 and WO 97/31256. 

30 In another embodiment of the oligonucleotide arrays of the invention, an oligonucleotide 

probe matrix may advantageously be used to detect mutations occurring in the PG-3 gene and 
preferably in its regulatory region. For this particular purpose, probes are specifically designed to 
have a nucleotide sequence allowing their hybridization to the genes that carry known mutations 
(either by deletion, insertion or substitution of one or several nucleotides). By known mutations, it 

35 is meant, mutations on the PG-3 gene that have been identified according, for example to the 
technique used by Huang et al. (1996) or Samson et al (1996). 
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Another technique that may be used to detect mutations in the PG-3 gene is the use of a 
high-density DNA array. Each oligonucleotide probe constituting a unit element of the high density 
DNA array is designed to match a specific subsequence of the PG-3 genomic DNA or cDNA. 
Thus, an array consisting of oligonucleotides complementary to subsequences of the target gene 

5 sequence is used to determine the identity of the target sequence within a sample, measure its 
amount, and detect differences between the target sequence and the sequence of the PG-3 gene in 
the sample. In one such design, termed 4L tiled array, a set of four probes (A, C, G, T), preferably 
15-nucleotide oligomers, is used In each set of four probes, the perfect complement will hybridize 
more strongly than mismatched probes. Consequently, a nucleic acid target of length L is scanned 

10 for mutations with a tiled array containing 4L probes, the whole probe set containing all the possible 
mutations in the known sequence. The hybridization signals of the 15-mer probe set tiled array are 
perturbed by a single base change in the target sequence. As a consequence, there is a characteristic 
loss of signal or a "footprint" for the probes flanking a mutation position. This technique was 
described by Chee et al in 1996. 

15 Consequently, the invention concerns an array of nucleic acid molecules comprising at least 

one polynucleotide described above as probes and primers. Preferably, the invention concerns an 
array of nucleic acid comprising at least two polynucleotides described above as probes and 
primers. 

A further object of the invention consists of an array of nucleic acid sequences comprising 

20 either at least one of the sequences selected from the group consisting of PI to P4 and P6 to P80, Bl 
to B52, CI to C52, Dl to D4, D6 to D80, El to E4 and E6 to E80, the sequences complementary 
thereto, a fragment thereof of at least 8, 10, 12, 15, 18, or 20 consecutive nucleotides thereof, or at 
least one sequence comprising a biallelic marker selected from the group consisting of Al to A80 
and the complements thereto. 

25 The invention also pertains to an array of nucleic acid sequences comprising either at least 

two of the sequences selected from the group consisting of PI to P4, P6 to P80, Bl to B52, CI to 
C52, Dl to D4, D6 to D80, El to E4 and E6 to E80, the sequences complementary thereto, a 
fragment thereof of at least 8 consecutive nucleotides thereof, or at least two sequences comprising 
a biallelic marker selected from the group consisting of Al to A80 and the complements thereof. 

30 PG-3 PROTEINS AND POLYPEPTIDE FRAGMENTS 

The term "PG-3 polypeptides" is used herein to embrace all of the proteins and 
polypeptides of the present invention. Also forming part of the invention are polypeptides encoded 
by the polynucleotides of the invention, as well as fusion polypeptides comprising such 
polypeptides. The invention embodies PG-3 proteins from humans, including isolated or purified 

35 PG-3 proteins consisting, consisting essentially, or comprising the sequence of SEQ ID No 3. More 
particularly, the present invention concerns allelic variants of the PG-3 protein comprising at least 
one amino acid selected from the group consisting of an arginine or an isoleucine residue at the 
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amino acid position 304 of the SEQ ED No 3, a histidine or an aspartic acid residue at the amino 
acid position 3 14 of the SEQ ID No 3, a threonine or an asparagine residue at the amino acid 
position 682 of the SEQ ID No 3, an alanine or a valine residue at the amino acid position 761 of 
the SEQ ID No 3, and a proline or a serine residue at the amino acid position 828 of the SEQ ID No 
5 3. In adddition, the invention also encompasses polypeptide variants of PG-3 comprising at least 
one amino acid selected from the group consisting of a methionine or an isoleucine residue at the 
position 91 of SEQ ED No 3, a valine or an alanine residue at the position 306 of SEQ ID No 3, a 
proline or a serine residue at the position 413 of SEQ ID No 3, a glycine or an aspartate residue at 
the position 528 of SEQ ID No 3, a valine or an alanine residue at the position 614 of SEQ ID No 3, 

10 a threonine or an asparagine residue at the position 677 of SEQ ID No 3, a valine or an alanine 
residue at the position 756 of SEQ ID No 3, a valine or an alanine residue at the position 758 of 
SEQ ID No 3, a lysine or a glutamate residue at the position 809 of SEQ ID No 3, and a cysteine or 
an arginine residue at the position 821 of SEQ ID No 3. 

The present invention includes isolated, purified, or recombinant polypeptides comprising a 

15 contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 3. The present invention also 
embodies isolated, purified, and recombinant polypeptides comprising a contiguous span of at least 
6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 
40, 50, or 100 amino acids of SEQ ID No 3, wherein said contiguous span includes at least 1, 2, 3, 5 

20 or 10 of the following amino acid positions of SEQ ID No 3: 1-100, 101-200, 201-300, 301-400, 
401-500, 501-600, 601-700, 701-835. In other preferred embodiments the contiguous stretch of 
amino acids comprises the site of a mutation or functional mutation, including a deletion, addition, 
swap or truncation of the amino acids in the PG-3 protein sequence. 

The invention also encompasses purified, isolated, or recombinant polypeptides comprising 

25 a sequence having at least 70, 75, 80, 85, 90, 95, 98 or 99% nucleotide identity with the sequence of 
SEQ ID No 3 or a fragment thereof. 

PG-3 proteins are preferably isolated from human or mammalian tissue samples or 
expressed from human or mammalian genes. The PG-3 polypeptides of the invention can be made 
using routine expression methods known in the art. The polynucleotide encoding the desired 

30 polypeptide, is ligated into an expression vector suitable for any convenient host. Both eukaryotic 
and prokaryotic host systems is used in forming recombinant polypeptides, and a summary of some 
of the more common systems. The polypeptide is then isolated from lysed cells or from the culture 
medium and purified to the extent needed for its intended use. Purification is by any technique 
known in the art, for example, differential extraction, salt fractionation, chromatography, 

35 centrifugation, and the like. See, for example, Methods in Enzymology for a variety of methods for 
purifying proteins. 
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In addition, shorter protein fragments is produced by chemical synthesis. Alternatively the 
proteins of the invention is extracted from cells or tissues of humans or non-human animals. 
Methods for purifying proteins are known in the art, and include the use of detergents or chaotropic 
agents to disrupt particles followed by differential extraction and separation of the polypeptides by 
5 ion exchange chromatography, affinity chromatography, sedimentation according to density, and 
gel electrophoresis. 

Any PG-3 cDNA, including SEQ ID No 2, may be used to express PG-3 proteins and 
polypeptides. The nucleic acid encoding the PG-3 protein or polypeptide to be expressed is operably 
linked to a promoter in an expression vector using conventional cloning technology. The PG-3 insert in 

10 the expression vector may comprise the full coding sequence for the PG-3 protein or a portion thereof. 
For example, the PG-3 derived insert may encode a polypeptide comprising at least 10 consecutive 
amino acids of the PG-3 protein of SEQ ID No 3, preferably least 10 consecutive amino acids 
including at least I, 2, 3, 5 or 10 of the following amino acid positions of SEQ ID No 3: 1-100, 101- 
200, 201-300, 301-400, 401-500, 501-600, 601-700, 701-835. 

1 5 The expression vector may be any of the mammalian, yeast, insect or bacterial expression 

systems known in the art. Commercially available vectors and expression systems are available from a 
variety of suppliers including Genetics Institute (Cambridge, MA), Stratagene (La Jolla, California), 
Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance 
expression and facilitate proper protein folding, the codon context and codon pairing of the sequence 

20 may be optimized for the particular expression organism in which the expression vector is introduced, 
as explained by Hatfield, et a/., and U.S. Patent No. 5,082,767. 

In one embodiment, the entire coding sequence of the PG-3 cDNA through the poly A signal 
of the cDNA is operably linked to a promoter in the expression vector. Alternatively, if the nucleic 
acid encoding a portion of the PG-3 protein lacks a methionine to serve as the initiation site, an 

25 initiating methionine can be introduced next to the first codon of the nucleic acid using conventional 
techniques. Similarly, if the insert from the PG-3 cDNA lacks a poly A signal, this sequence can be 
added to the construct by, for example, splicing out the Poly A signal from pSG5 (Stratagene) using 
Bgll and Sail restriction endonuclease enzymes and incorporating it into the mammalian expression 
vector pXTl (Stratagene). pXTl contains the LTRs and a portion of the gag gene from Moloney 

30 Murine Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. 
The vector includes the Herpes Simplex Thymidine Kinase promoter and the selectable neomycin 
gene. The nucleic acid encoding the PG-3 protein or a portion thereof is obtained by PCR from a 
bacterial vector containing the PG-3 cDNA of SEQ ID No 3 using oligonucleotide primers 
complementary to the PG-3 cDNA or portion thereof and containing restriction endonuclease 

35 sequences for Pst I incorporated into the 5'primer and Bgin at the 5' end of the corresponding cDNA 3' 
primer, taking care to ensure that the sequence encoding the PG-3 protein or a portion thereof is 
positioned properly with respect to the poly A signal. The purified fragment obtained from the 
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resulting PCR reaction is digested with PstI, blunt ended with an exonuclease, digested with Bgl n, 
purified and ligated to pXTl, now containing a poly A signal and digested with Bglll. 

The ligated product is transfected into mouse NW 3T3 cells using Lipofectin (Life 
Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. 
5 Positive transfectants are selected after growing the transfected cells in 600ug/ml G41 8 (Sigma, St. 
Louis, Missouri). 

The above procedures may also be used to express a mutant PG-3 protein responsible for a 
detectable phenotype or a portion thereof. 

The expressed protein is purified using conventional purification techniques such as 

10 ammonium sulfate precipitation or chromatographic separation based on size or charge. The protein 
encoded by the nucleic acid insert may also be purified using standard immunochromatography 
techniques. In such procedures, a solution containing the expressed PG-3 protein or portion thereof, 
such as a cell extract, is applied to a column having antibodies against the PG-3 protein or portion 
thereof attached to the chromatography matrix. The expressed protein is allowed to bind the 

1 5 immunochromatography column. Thereafter, the column is washed to remove non-specifically bound 
proteins. The specifically bound expressed protein is then released from the column and recovered 
using standard techniques. 

To confirm expression of the PG-3 protein or a portion thereof, the proteins expressed from 
host cells containing an expression vector containing an insert encoding the PG-3 protein or a portion 

20 thereof can be compared to the proteins expressed in host cells containing the expression vector without 
an insert. The presence of a band in samples from cells containing the expression vector with an insert 
which is absent in samples from cells containing the expression vector without an insert indicates that 
the PG-3 protein or a portion thereof is being expressed. Generally, the band will have the mobility 
expected for the PG-3 protein or portion thereof. However, the band may have a mobility different 

25 than that expected as a result of modifications such as glycosylation, ubiquitination, or enzymatic 
cleavage. 

Antibodies capable of specifically recognizing the expressed PG-3 protein or a portion thereof 
are described below. 

If antibody production is not possible, the nucleic acids encoding the PG-3 protein or a portion 
30 thereof is incorporated into expression vectors designed for use in purification schemes employing 
chimeric polypeptides. In such strategies the nucleic acid encoding the PG-3 protein or a portion 
thereof is inserted in frame with the gene encoding the other half of the chimera. The other half of the 
chimera is p-globin or a nickel binding polypeptide encoding sequence. A chromatography matrix 
having antibody to P-globin or nickel attached thereto is then used to purify the chimeric protein. 
35 Protease cleavage sites are engineered between the P-globin gene or the nickel binding polypeptide and 
the PG-3 protein or portion thereof. Thus, the two polypeptides of the chimera is separated from one 
another by protease digestion. 
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One useful expression vector for generating J3-globin chimeric proteins is pSG5 (Stratagene), 
which encodes rabbit (3-globin. Intron II of the rabbit P-globin gene facilitates splicing of the expressed 
transcript, and the polyadenylation signal incorporated into the construct increases the level of 
expression. These techniques are well known to those skilled in the art of molecular biology. Standard 
5 methods are published in methods texts such as Davis et a/., (1986) and many of the methods are 
available from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be 
produced from the construct using in vitro translation systems such as the In vitro Express™ 
Translation Kit (Stratagene). 

ANTIBODIES THAT BIND PG-3 POLYPEPTIDES OF THE INVENTION 

10 Any PG-3 polypeptide or whole protein may be used to generate antibodies capable of 

specifically binding to an expressed PG-3 protein or fragments thereof as described. 

One antibody composition of the invention is capable of specifically binding to the PG-3 
protein of SEQ ID No 3. For an antibody composition to specifically bind to the PG-3 protein, it 
must demonstrate at least a 5%, 10%, 15%, 20%, 25%, 50%, or 100% greater binding affinity for 

1 5 PG-3 protein than for another protein in an ELIS A, RIA, or other antibody-based binding assay. 

The invention also concerns antibody compositions which are specific for variants of the 
PG-3 protein, more particuarly variants comprising at least one amino acid selected from the group 
consisting of a methionine or an isoleucine residue at the position 91 of SEQ ID No 3, a valine or an 
alanine residue at the position 306 of SEQ ID No 3, a proline or a serine residue at the position 413 

20 of SEQ ID No 3, a glycine or an aspartate residue at the position 528 of SEQ ID No 3, a valine or an 
alanine residue at the position 614 of SEQ ID No 3, a threonine or an asparagine residue at the 
position 677 of SEQ ID No 3, a valine or an alanine residue at the position 756 of SEQ ID No 3, a 
valine or an alanine residue at the position 758 of SEQ ID No 3, a lysine or a glutamate residue at 
the position 809 of SEQ ID No 3, and a cysteine or an arginine residue at the position 82 1 of SEQ 

25 ID No 3. More preferably, the invention encompasses antibody compositions which are specific for 
an allelic variant of the PG-3 protein, more particuarly a variant comprising at least one amino acid 
selected from the group consisting of an arginine or an isoleucine residue at the amino acid position 
304 of SEQ ID No 3, a histidine or an aspartic acid residue at the amino acid position 314 of SEQ 
ID No 3, a threonine or an asparagine residue at the amino acid position 682 of SEQ ED No 3, an 

30 alanine or a valine residue at the amino acid position 761 of SEQ ID No 3, and a proline or a serine 
residue at the amino acid position 828 of SEQ ID No 3. 

In a preferred embodiment, the invention concerns antibody compositions, either polyclonal 
or monoclonal, capable of selectively binding, or selectively bind to an epitope-containing a 
polypeptide comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 

35 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 3; 
preferably, said epitope comprises at least 1, 2, 3, 5 or 10 of the following amino acid positions of 
SEQ ID No 3: 1-100, 101-200, 201-300, 301-400, 401-500, 501-600, 601-700, 701-835. 
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The inventi n also concerns a purified or isolated antibody capable of specifically binding 
to a mutated PG-3 protein or to a fragment or variant thereof comprising an epitope of the mutated 
PG-3 protein. In another preferred embodiment, the present invention concerns an antibody capable 
of binding to a polypeptide comprising at least 10 consecutive amino acids of a PG-3 protein and 
5 including at least one of the amino acids which can be encoded by the trait causing mutations. 

In a preferred embodiment, the invention concerns the use in the manufacture of antibodies 
of a polypeptide comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 
amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 3; 
preferably, said contiguous span comprises at least 1, 2, 3, 5 or 10 of the following amino acid 
10 positions of SEQ ID No 3: 1-100, 101-200, 201-300, 30M00, 401-500, 501-600, 601-700, 701- 
835. 

Non-human animals or mammals, whether wild-type or transgenic, which express a 
different species of PG-3 than the one to which antibody binding is desired, and animals which do 
not express PG-3 (i.e. a PG-3 knock out animal as described herein) are particularly useful for 

1 5 preparing antibodies. PG-3 knock out animals will recognize all or most of the exposed regions of a 
PG-3 protein as foreign antigens, and therefore produce antibodies with a wider array of PG-3 
epitopes. Moreover, smaller polypeptides with only 10 to 30 amino acids may be useful in 
obtaining specific binding to any one of the PG-3 proteins. In addition, the humoral immune 
system of animals which produce a species of PG-3 that resembles the antigenic sequence will 

20 preferentially recognize the differences between the animal's native PG-3 species and the antigen 
sequence, and produce antibodies to these unique sites in the antigen sequence. Such a technique 
will be particularly useful in obtaining antibodies that specifically bind to any one of the PG-3 
proteins. 

Antibody preparations prepared according to either protocol are useful in quantitative 
25 immunoassays which determine concentrations of antigen-bearing substances in biological samples; 
they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological 
sample. The antibodies may also be used in therapeutic compositions for killing cells expressing the 
protein or reducing the levels of the protein in the body. 

The antibodies of the invention may be labeled using any one of the radioactive, fluorescent or 
30 enzymatic labels known in the art. 

Consequently, the invention is also directed to a method for specifically detecting the 
presence of a PG-3 polypeptide according to the invention in a biological sample, said method 
comprising the following steps : 

a) bringing the biological sample into contact with a polyclonal or monoclonal 

35 antibody that specifically binds t a PG-3 polypeptide comprising an amino acid sequence 

of SEQ ID No 3, or to a peptide fragment or variant thereof; and 

b) detecting the antigen-antibody complex formed. 
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The invention also concerns a diagnostic kit for detecting the presence of a PG-3 
polypeptide according to the present invention in a biological sample in vitro , wherein said kit 
comprises: 

a) a polyclonal or monoclonal antibody that specifically binds to a PG-3 

5 polypeptide comprising the amino acid sequence of SEQ ID No 3, or to a peptide fragment 

or variant thereof; optionally the antibody may be labeled; and 

b) a reagent allowing the detection of the antigen-antibody complexes formed, said 
reagent optionally carrying a label, or being able to be recognized itself by a labeled reagent 
(particularly in the case when the above-mentioned monoclonal or polyclonal antibody 

10 itself is not labeled). 

PG-3 -RELATED BIALLELIC MARKERS 
Advantages Of The Biallelic Markers Of The Present Invention 

The PG-3-related biallelic markers of the present invention offer a number of important 
advantages over other genetic markers such as RFLP (Restriction fragment length polymorphism) 

1 5 and VNTR (Variable Number of Tandem Repeats) markers. 

The first generation of markers were RFLPs, which are variations that modify the length of 
a restriction fragment. But methods used to identify and to type RFLPs are relatively wasteful of 
materials, effort, and time. The second generation of genetic markers were VNTRs, which can be 
categorized as either minisatellites or microsatellites. Minisatellites are tandemly repeated DNA 

20 sequences present in units of 5-50 repeats which are distributed along regions of the human 

chromosomes ranging from 0.1 to 20 kilobases in length. Since they present many possible alleles, 
their informative content is very high. Minisatellites are scored by performing Southern blots to 
identify the number of tandem repeats present in a nucleic acid sample from the individual being 
tested. However, there are only 10 4 potential VNTRs that can be typed by Southern blotting. 

25 Moreover, both RFLP and VNTR markers are costly and time-consuming to develop and assay in 
large numbers. 

Single nucleotide polymorphisms (SNPs) or biallelic markers can be used in the same 
manner as RFLPs and VNTRs but offer several advantages. SNPs are densely spaced in the human 
genome and represent the most frequent type of variation. An estimated number of more than 10 7 

30 sites are scattered along the 3xl0 9 base pairs of the human genome. Therefore, SNPs occur at a 
greater frequency and with greater uniformity than RFLP or VNTR markers which means that there 
is a greater probability that such a marker will be found in close proximity to a genetic locus of 
interest. SNPs are less variable than VNTR markers but are mutationally more stable. 

Also, the different forms of a characterized single nucleotide polymorphism, such as the 

35 biallelic markers of the present invention, are often easier to distinguish and can therefore be typed 
easily on a routine basis. Biallelic markers have single nucleotide based alleles and they have only 
two common alleles, which allows highly parallel detection and automated scoring. The biallelic 
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markers of the present invention offer the possibility of rapid, high throughput genotyping of a large 
number of individuals. 

Biallelic markers are densely spaced in the genome, sufficiently informative and can be 
assayed in large numbers. The combined effects of these advantages make biallelic markers 
5 extremely valuable in genetic studies. Biallelic markers can be used in linkage studies in families, 
in allele sharing methods, in linkage disequilibrium studies in populations, in association studies of 
case-control populations or of trait positive and trait negative populations. An important aspect of 
the present invention is that biallelic markers allow association studies to be performed to identify 
genes involved in complex traits. Association studies examine the frequency of marker alleles in 

10 unrelated case- and control-populations and are generally employed in the detection of polygenic or 
sporadic traits. Association studies may be conducted within the general population and are not 
limited to studies performed on related individuals in affected families (linkage studies). Biallelic 
markers in different genes can be screened in parallel for direct association with disease or response 
to a treatment. This multiple gene approach is a powerful tool for a variety of human genetic 

15 studies as it provides the necessary statistical power to examine the synergistic effect of multiple 
genetic factors on a particular phenotype, drug response, sporadic trait, or disease state with a 
complex genetic etiology. 

Candidate Gene Of The Present Invention 

Different approaches can be employed to perform association studies: genome-wide 

20 association studies, candidate region association studies and candidate gene association studies. 
Genome-wide association studies rely on the screening of genetic markers evenly spaced and 
covering the entire genome. The candidate gene approach is based on the study of genetic markers 
specifically located in genes potentially involved in a biological pathway related to the trait of 
interest. In the present invention, PG-3 is a good candidate gene for cancer. The candidate gene 

25 analysis clearly provides a short-cut approach to the identification of genes and gene 

polymorphisms related to a particular trait when some information concerning the biology of the 
trait is available. However, it should be noted that all of the biallelic markers disclosed in the 
instant application can be employed as part of genome-wide association studies or as part of 
candidate region association studies and such uses are specifically contemplated in the present 

30 invention and claims. 

PG-3-Related Biallelic Markers And Polynucleotides Related Thereto 
The invention also concerns PG-3-related biallelic markers. As used herein the term "PG-3- 
related biallelic marker 1 * relates to a set of biallelic markers in linkage disequilibrium with the PG-3 
gene. The term PG-3-related biallelic marker includes the biallelic markers designated Al to A80. 

35 A portion of the biallelic markers of the present invention are disclosed in Table 2. Their 

locations in the PG-3 gene are indicated in Table 2 and also as a single base polymorphism in the 
features of SEQ ED Nos 1 and 2 listed in the accompanying Sequence Listing. The pairs of primers 
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The invention further concerns a nucleic acid encoding the PG-3 protein, wherein said 
nucleic acid comprises a polymorphic base of a biallelic marker selected from the group consisting 
of A 1 to A80 and the complements thereof. 

The invention also encompasses the use of any polynucleotide for, or any polynucleotide 
5 for use in, determining the identity of one or more nucleotides at a PG-3 -related biallelic marker. In 
addition, the polynucleotides of the invention for use in determining the identity of one or more 
nucleotides at a PG-3-related biallelic marker encompass polynucleotides with any further 
limitation described in this disclosure, or those following, specified alone or in any combination. 
Optionally, said PG-3-related biallelic marker is selected from the group consisting of Al to A80, 

10 and the complements thereof, or optionally the biallelic markers in linkage disequilibrium 

therewith; optionally, said PG-3-related biallelic marker is selected from the group consisting of Al 
to A5 and A8 to A80, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, said PG-3 -related biallelic marker is selected from the group 
consisting A6 and A7, and the complements thereof, or optionally the biallelic markers in linkage 

15 disequilibrium therewith; Optionally, said polynucleotide may comprise a sequence disclosed in the 
present specification; Optionally, said polynucleotide may comprise, consist of, or consist 
essentially of any polynucleotide described in the present specification; Optionally, said 
determining may involve a hybridization assay, sequencing assay, microsequencing assay, or an 
enzyme-based mismatch detection assay; Optionally, said polynucleotide may be attached to a 

20 solid support, array, or addressable array; Optionally, said polynucleotide may be labeled. A 
preferred polynucleotide may be used in a hybridization assay for determining the identity of the 
nucleotide at a PG-3-related biallelic marker. Another preferred polynucleotide may be used in a 
sequencing or microsequencing assay for determining the identity of the nucleotide at a PG-3- 
related biallelic marker. A third preferred polynucleotide may be used in an enzyme-based 

25 mismatch detection assay for determining the identity of the nucleotide at a PG-3-related biallelic 
marker. A fourth preferred polynucleotide may be used in amplifying a segment of polynucleotides 
comprising a PG-3-related biallelic marker. Optionally, any of the polynucleotides described above 
may be attached to a solid support, array, or addressable array; Optionally, said polynucleotide may 
be labeled. 

30 Additionally, the invention encompasses the use of any polynucleotide for, or any 

polynucleotide for use in amplifying a segment of nucleotides comprising a PG-3-related biallelic 
marker. In addition, the polynucleotides of the invention for use in amplifying a segment of 
nucleotides comprising a PG-3-related biallelic marker encompass polynucleotides with any further 
limitation described in this disclosure, or those following, specified alone or in any combination: 

35 Optionally, said PG-3-related biallelic marker is selected from the group consisting of Al to A80, 
and the complements thereof, or optionally the biallelic markers in linkage disequilibrium 
therewith; optionally, said PG-3-related biallelic marker is selected from the group consisting of Al 
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to A5 and A8 to A80, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, said PG-3-related biallelic marker is selected from the group 
consisting A6 and A7, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; Optionally, said polynucleotide may comprise a sequence disclosed in the 
5 present specification; Optionally, said polynucleotide may comprise, consist of, or consist 

essentially of any polynucleotide described in the present specification; Optionally, said amplifying 
may involve PCR or LCR. Optionally, said polynucleotide may be attached to a solid support, 
array, or addressable array. Optionally, said polynucleotide may be labeled. 

The primers for amplification or sequencing reaction of a polynucleotide comprising a 

10 biallelic marker of the invention may be designed from the disclosed sequences for any method 
known in the art. A preferred set of primers are fashioned such that the 3 1 end of the contiguous 
span of identity with a sequence selected from the group consisting of SEQ ID Nos 1 and 2 or a 
sequence complementary thereto or a variant thereof is present at the 3 f end of the primer. Such a 
configuration allows the 3' end of the primer to hybridize to a selected nucleic acid sequence and 

1 5 dramatically increases the efficiency of the primer for amplification or sequencing reactions. Allele 
specific primers may be designed such that a polymorphic base of a biallelic marker is at the 3 1 end 
of the contiguous span and the contiguous span is present at the 3* end of the primer. Such allele 
specific primers tend to selectively prime an amplification or sequencing reaction so long as they 
are used with a nucleic acid sample that contains one of the two alleles present at a biallelic marker. 

20 The 3* end of the primer of the invention may be located within or at least 2, 4, 6, 8, 10, 12, 15, 18, 
20, 25, 50, 100, 250, 500, or 1000 nucleotides upstream of a PG-3-related biallelic marker in said 
sequence or at any other location which is appropriate for their intended use in sequencing, 
amplification or the location of novel sequences or markers. Thus, another set of preferred 
amplification primers comprise an isolated polynucleotide consisting essentially of a contiguous 

25 span of at least 8, 10, 12, 15, 18, 20, 25, 30, 35, 40, or 50 nucleotides in length of a sequence 

selected from the group consisting of SEQ ID Nos 1 and 2 or a sequence complementary thereto or 
a variant thereof, wherein the 3' end of said contiguous span is located at the 3'end of said 
polynucleotide, and wherein the 3'end of said polynucleotide is located upstream of a PG-3-related 
biallelic marker in said sequence. Preferably, those amplification primers comprise a sequence 

30 selected from the group consisting of the sequences Bl to B52 and CI to C52. Primers with their 3' 
ends located 1 nucleotide upstream of a biallelic marker of PG-3 have a special utility as 
microsequencing assays. Preferred microsequencing primers are described in Table 4. Optionally, 
said PG-3-related biallelic marker is selected from the group consisting of Al to A80, and the 
complements thereof, or optionally, the biallelic markers in linkage disequilibrium therewith; 

35 optionally, said PG-3-related biallelic marker is selected from the group consisting of Al to A5 and 
A8 to A80, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, said PG-3-related biallelic marker is selected from the group 
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consisting A6 and A7, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; Optionally, microsequencing primers are selected from the group 
consisting of the nucleotide sequences of Dl to D4, D6 to D80, El to E4 and E6 to E80. More 
preferred microsequencing primers are selected from the group consisting of the nucleotides 
5 sequences of D 1 4, D46, D68, D70, D71, E3, E6, E7, Ell, E13, E42, E44, E72 and E75 . 

The probes of the present invention may be designed from the disclosed sequences for use 
in any method known in the art, particularly methods for testing if a marker disclosed herein is 
present in a sample. A preferred set of probes may be designed for use in the hybridization assays 
of the invention in any manner known in the art such that they selectively bind to one allele of a 

10 biallelic marker, but not the other under any particular set of assay conditions. Preferred 

hybridization probes comprise the polymorphic base of either allele 1 or allele 2 of the relevant 
biallelic marker. Optionally, said biallelic marker may be within 6, 5, 4, 3, 2, or 1 nucleotides of the 
center of the hybridization probe or at the center of said probe. In a preferred embodiment, the 
probes are selected from the group consisting of the sequences PI to P4 and P6 to P80 and the 

1 5 complementary sequence thereto. 

It should be noted that the polynucleotides of the present invention are not limited to having 
the exact flanking sequences surrounding the polymorphic bases which are enumerated in Sequence 
Listing. Rather, it will be appreciated that the flanking sequences surrounding the biallelic markers 
may be lengthened or shortened to any extent compatible with their intended use and the present 

20 invention specifically contemplates such sequences. The flanking regions outside of the contiguous 
span need not be homologous to native flanking sequences which actually occur in human subjects. 
The addition of any nucleotide sequence which is compatible with the polynucleotide's intended use 
is specifically contemplated. 

Primers and probes may be labeled or immobilized on a solid support as described in the 

25 section entitled "Oligonucleotide probes and primers". 

The polynucleotides of the invention which are attached to a solid support encompass 
polynucleotides with any further limitation described in this disclosure, or those following, alone or 
in any combination: Optionally, said polynucleotides may be attached individually or in groups of at 
least 2, 5, 8, 10, 12, 15, 20, or 25 distinct polynucleotides of the invention to a single solid support. 

30 Optionally, polynucleotides other than those of the invention may attached to the same solid support 
as polynucleotides of the invention. Optionally, when multiple polynucleotides are attached to a 
solid support they may be attached at random locations, or in an ordered array. Optionally, said 
ordered array may be addressable. 

The present invention also encompasses diagnostic kits comprising one or more 

35 polynucleotides of the invention with a portion or all of the necessary reagents and instructions for 
genotyping a test subject by determining the identity of a nucleotide at a PG-3-related biallelic 
marker. The polynucleotides of a kit may optionally be attached to a solid support, or be part of an 
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array or addressable array of polynucleotides. The kit may provide for the determination of the 
identity of the nucleotide at a marker position by any method known in the art including, but not 
limited to, a sequencing assay method, a microsequencing assay method, a hybridization assay 
method, or an enzyme-based mismatch detection assay method. 

METHODS FOR DENOVO IDENTIFICATION OF BIALLELIC MARKERS 
Any of a variety of methods can be used to screen a genomic fragment for single nucleotide 
polymorphisms, including methods such as differential hybridization with oligonucleotide probes, 
detection of changes in the mobility measured by gel electrophoresis or direct sequencing of the 
amplified nucleic acid. A preferred method for identifying biallelic markers involves comparative 
sequencing of genomic DNA fragments from an appropriate number of unrelated individuals. 

In a first embodiment, DNA samples from unrelated individuals are pooled together, 
following which the genomic DNA of interest is amplified and sequenced. The nucleotide 
sequences thus obtained are then analyzed to identify significant polymorphisms. One of the major 
advantages of this method resides in the fact that the pooling of the DNA samples substantially 
reduces the number of DNA amplification reactions and sequencing reactions, which must be 
carried out. Moreover, this method is sufficiently sensitive so that a biallelic marker obtained 
thereby usually demonstrates a sufficient frequency of its less common allele to be useful in 
conducting association studies! 

In a second embodiment, the DNA samples are not pooled and are therefore amplified and 
sequenced individually. This method is usually preferred when biallelic markers need to be 
identified in order to perform association studies within candidate genes. Preferably, highly 
relevant gene regions such as promoter regions or exon regions may be screened for biallelic 
markers. A biallelic marker obtained using this method may show a lower degree of 
informativeness for conducting association studies, e.g. if the frequency of its less frequent allele is 
less than about 10%. Such a biallelic marker will, however, be sufficiently informative to conduct 
association studies and it will further be appreciated that including less informative biallelic markers 
in the genetic analysis studies of the present invention, may, in some cases, allow the direct 
identification of causal mutations, which may, depending on their penetrance, be rare mutations. 

The following is a description of the various parameters of a preferred method used by the 
inventors for the identification of the biallelic markers of the present invention. 
Genomic DNA Samples 

The genomic DNA samples from which the biallelic markers of the present invention are 
generated are preferably obtained from unrelated individuals corresponding to a heterogeneous 
population of known ethnic background. The number of individuals from whom DNA samples are 
obtained can vary substantially, but is preferably from about 10 to about 1000, or preferably from 
about 50 to about 200 individuals. It is usually preferred to collect DNA samples from at least 
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about 100 individuals in order to have sufficient polymorphic diversity in a given population to 
identify as many markers as possible and to generate statistically significant results. 

As for the source of the genomic DNA to be subjected to analysis, any test sample can be 
foreseen without any particular limitation. These test samples include biological samples, which 
5 can be tested by the methods of the present invention described herein, and include human and 
animal body fluids such as whole blood, serum, plasma, cerebrospinal fluid, urine, lymph fluids, 
and various external secretions of the respiratory, intestinal and genitourinary tracts, tears, saliva, 
milk, white blood cells, myelomas and the like; biological fluids such as cell culture supernatants; 
fixed tissue specimens including tumor and non-tumor tissue and lymph node tissues; bone marrow 

10 aspirates and fixed cell specimens. The preferred source of genomic DNA used in the present 
invention is from peripheral venous blood of each donor. Techniques to prepare genomic DNA 
from biological samples are well known to the skilled technician. Details of a preferred 
embodiment are provided in Example 1 . The person skilled in the art can choose to amplify pooled 
or unpooled DNA samples. 

1 5 DNA Amplification 

The identification of biallelic markers in a sample of genomic DNA may be facilitated 
through the use of DNA amplification methods. DNA samples can be pooled or unpooled for the 
amplification step. DNA amplification techniques are well known to those skilled in the art. 

Amplification techniques that can be used in the context of the present invention include, 

20 but are not limited to, the ligase chain reaction (LCR) described in EP-A- 320 308, WO 9320227 
and EP-A-439 182, the polymerase chain reaction (PCR, RT-PCR) and techniques such as the 
nucleic acid sequence based amplification (NASBA) described in Guatelli J.C., et a/.(1990) and in 
Compton J.(1991), Q-beta amplification as described in European Patent Application No 4544610, 
strand displacement amplification as described in Walker et a/. (1996) and EP A 684 315 and, target 

25 mediated amplification as described in PCT Publication WO 9322461. 

LCR and Gap LCR are exponential amplification techniques, both of which utilize DNA 
ligase to join adjacent primers annealed to a DNA molecule. In Ligase Chain Reaction (LCR), 
probe pairs are used which include two primary (first and second) and two secondary (third and 
fourth) probes, all of which are employed in molar excess to target. The first probe hybridizes to a 

30 first segment of the target strand and the second probe hybridizes to a second segment of the target 
strand, the first and second segments being contiguous so that the primary probes abut one another 
in 5' phosphateOliydroxyl relationship, and so that a ligase can covalently fuse or ligate the two 
probes into a fused product. In addition, a third (secondary) probe can hybridize to a portion of the 
first probe and a fourth (secondary) probe can hybridize to a portion of the second probe in a similar 

35 abutting fashion. Of course, if the target is initially double stranded, the secondary probes also will 
hybridize to the target complement in th first instance. Once the ligated strand of primary probes 
is separated from the target strand, it will hybridize with the third and fourth probes, which can be 
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ligated to form a complementary, secondary ligated product. It is important to realize that the 
ligated products are functionally equivalent to either the target or its complement. By repeated 
cycles of hybridization and ligation, amplification of the target sequence is achieved. A method for 
multiplex LCR has also been described (WO 9320227). Gap LCR (GLCR) is a version of LCR 
S where the probes are not adjacent but are separated by 2 to 3 bases. 

For amplification of mRNAs, it is within the scope of the present invention to reverse 
transcribe mRNA into cDNA followed by polymerase chain reaction (RT-PCR); or, to use a single 
enzyme for both steps as described in U.S. Patent No. 5,322,770 or, to use Asymmetric Gap LCR 
(RT-AGLCR) as described by Marshall et a/.(1994). AGLCR is a modification of GLCR that 
1 0 allows the amplification of RNA. 

The PCR technology is the preferred amplification technique used in the present invention. 
A variety of PCR techniques are familiar to those skilled in the art. For a review of PCR 
technology, see White (1992) and the publication entitled "PCR Methods and Applications" (1991, 
Cold Spring Harbor Laboratory Press). In each of these PCR procedures, PCR primers on either 
1 5 side of the nucleic acid sequences to be amplified are added to a suitably prepared nucleic acid 
sample along with dNTPs and a thermostable polymerase such as Taq polymerase, Pfu polymerase, 
or Vent polymerase. The nucleic acid in the sample is denatured and the PCR primers are 
specifically hybridized to complementary nucleic acid sequences in the sample. The hybridized 
primers are extended. Thereafter, another cycle of denaturation, hybridization, and extension is 
20 initiated. The cycles are repeated multiple times to produce an amplified fragment containing the 
nucleic acid sequence between the primer sites. PCR has further been described in several patents 
including US Patents 4,683,195; 4,683,202; and 4,965,188. 

The PCR technology is the preferred amplification technique used to identify new biallelic 
markers. A typical example of a PCR reaction suitable for the purposes of the present invention is 
25 provided in Example 2. 

One of the aspects of the present invention is a method for the amplification of the human 
PG-3 gene, particularly of a fragment of the genomic sequence of SEQ ED No 1 or of the cDNA 
sequence of SEQ ID No 2, or a fragment or a variant thereof in a test sample, preferably using the 
PCR technology. This method comprises the steps of: 
30 a) contacting a test sample with amplification reaction reagents comprising a pair of 

amplification primers as described above which are located on either side of the 
polynucleotide region to be amplified, and 

b) optionally, detecting the amplification products. 
The invention also concerns a kit for the amplification of a PG-3 gene sequence, 
35 particularly of a portion of the genomic sequence of SEQ ED No 1 or of the cDNA sequence of SEQ 
ID No 2, or a variant thereof in a test sample, wherein said kit comprises: 
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a) a pair of oligonucleotide primers located on either side of the PG-3 region to be 
amplified; 

b) optionally, the reagents necessary for performing the amplification reaction. 

In one embodiment of the above amplification method and kit, the amplification product is 
5 detected by hybridization with a labeled probe having a sequence which is complementary to the 
amplified region. In another embodiment of the above amplification method and kit, primers 
comprise a sequence which is selected from the group consisting of the nucleotide sequences of Bl 
to B52, CI to C52, Dl to D4, D6 to D80, El to E4, and E6 to E80. 

In a first embodiment of the present invention, biallelic markers are identified using 
10 genomic sequence information generated by the inventors. Sequenced genomic DNA fragments are 
used to design primers for the amplification of 500 bp fragments. These 500 bp fragments are 
amplified from genomic DNA and are scanned for biallelic markers. Primers may be designed 
using the OSP software (Hillier L. and Green P., 1991). All primers may contain, upstream of the 
specific target bases, a common oligonucleotide tail that serves as a sequencing primer. Those 
15 skilled in the art are familiar with primer extensions, which can be used for these purposes. 

Preferred primers, useful for the amplification of genomic sequences encoding the 
candidate genes, focus on promoters, exons and splice sites of the genes. A biallelic marker 
presents a higher probability to be a causal mutation if it is located in these functional regions of the 
gene. Preferred amplification primers of the invention include the nucleotide sequences Bl to B52 
20 and CI to C52, detailed further in Example 2, Table 1 . 

Sequencing Of Amplified Genomic DNA And Identification Of Single Nucleotide 
Polymorphisms 

The amplification products generated as described above, are then sequenced using any 
method known and available to the skilled technician. Methods for sequencing DNA using either 
25 the dideoxy-mediated method (Sanger method) or the Maxam-Gilbert method are widely known to 
those of ordinary skill in the art. Such methods are disclosed in Sambrook et a/.(1989) for example. 
Alternative approaches include hybridization to high-density DNA probe arrays as described in 
Chee efa/.(1996). 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
30 reactions using a dye-primer cycle sequencing protocol. The products of the sequencing reactions 
are run on sequencing gels and the sequences are determined using gel image analysis. The 
polymorphism search is based on the presence of superimposed peaks in the electrophoresis pattern 
resulting from different bases occurring at the same position. Because each dideoxy terminator is 
labeled with a different fluorescent molecule, the two peaks corresponding to a biallelic site present 
35 distinct colors corresponding to two different nucleotides at the same position on the sequence. 
However, the presence of two peaks can be an artifact due to background noise. To exclude such an 
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artifact, the two DNA strands are sequenced and a comparison between the peaks is carried out. hi 
order to confirm that a sequence is polymorphic, the polymorphism is be detected on both strands. 

The above procedure permits those amplification products which contain biallelic markers 
to be identified. The detection limit for the frequency of biallelic polymorphisms detected by 
5 sequencing pools of 100 individuals is approximately 0.1 for the minor allele, as verified by 
sequencing pools of known allelic frequencies. However, more than 90% of the biallelic 
polymorphisms detected by the pooling method have a frequency for the minor allele higher than 
0.25. Therefore, the biallelic markers selected by this method have a frequency of at least 0.1 for 
the minor allele and less than 0.9 for the major allele. Preferably, the biallelic markers selected by 

10 this method have a frequency of at least 0.2 for the minor allele and less than 0.8 for the major 
allele, more preferably at least 0.3 for the minor allele and less than 0.7 for the major allele. Thus, 
the biallelic markers preferably have a heterozygosity rate higher than 0.18, more preferably higher 
than 0.32, still more preferably higher than 0.42. 

In another embodiment, biallelic markers are detected by sequencing individual DNA 

1 5 samples. In some embodiments, the frequency of the minor allele of such a biallelic marker may be 
less than 0.1. 

Validation Of The Biallelic Markers Of The Present Invention 

The polymorphisms are evaluated for their usefulness as genetic markers by validating that 
both alleles are present in a population. Validation of the biallelic markers is accomplished by 

20 genotyping a group of individuals by a method of the invention and demonstrating that both alleles 
are present. Microsequencing is a preferred method of genotyping alleles. The validation by 
genotyping step may be performed on individual samples derived from each individual in the group 
or by genotyping a pooled sample derived from more than one individual. The group can be as 
small as one individual if that individual is heterozygous for the allele in question. Preferably the 

25 group contains at least three individuals, more preferably the group contains five or six individuals, 
so that a single validation test will be more likely to result in the validation of more of the biallelic 
markers that are being tested. It should be noted, however, that when the validation test is 
performed on a small group it may result in a false negative result if as a result of sampling error 
none of the individuals tested carries one of the two alleles. Thus, the validation process is less 

30 useful in demonstrating that a particular initial result is an artifact, than it is at demonstrating that 
there is a bona fide biallelic marker at a particular position in a sequence. All of the genotyping, 
haplotyping, association, and interaction study methods of the invention may optionally be 
performed solely with validated biallelic markers. 

Evaluation Of The Frequency Of The Biallelic Markers Of The Present Invention 

35 The validated biallelic markers are further evaluated for their usefulness as genetic markers 

by determining the frequency of the least common allele at the biallelic marker site. The higher the 
frequency of the less common allele the greater the usefulness of the biallelic marker in association 
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and interaction studies. The identification of the least common allele is accomplished by 
genotyping a group of individuals by a method of the invention and demonstrating that both alleles 
are present. The determination of marker frequency by genotyping may be performed using 
individual samples derived from each individual in the group or by genotyping a pooled sample 
5 derived from more than one individual. The group must be large enough to be representative of the 
population as a whole. Preferably the group contains at least 20 individuals, more preferably the 
group contains at least 50 individuals, most preferably the group contains at least 100 individuals. 
Of course the larger the group the greater the accuracy of the frequency determination because of 
reduced sampling error. A biallelic marker wherein the frequency of the less common allele is 30% 

10 or more is termed a "high quality biallelic marker." All of the genotyping, haplotyping, association, 
and interaction study methods of the invention may optionally be performed solely with high 
quality biallelic markers. 

METHODS FOR GENOTYPING AN INDIVIDUAL FOR BIALLELIC MARKERS 
Methods are provided to genotype a biological sample for one or more biallelic markers of 

1 5 the present invention, all of which may be performed in vitro. Such methods of genotyping 
comprise determining the identity of a nucleotide at a PG-3 biallelic marker site by any method 
known in the art. These methods find use in genotyping case-control populations in association 
studies as well as individuals in the context of detection of alleles of biallelic markers which are 
known to be associated with a given trait, in which case both copies of the biallelic marker present 

20 in individual's genome are determined so that an individual may be classified as homozygous or 
heterozygous for a particular allele. 

These genotyping methods can be performed on nucleic acid samples derived from a single 
individual or pooled DNA samples. 

Genotyping can be performed using methods similar to those described above for the 

25 identification of the biallelic markers, or using other genotyping methods such as those further 
described below. In preferred embodiments, the comparison of sequences of amplified genomic 
fragments from different individuals is used to identify new biallelic markers whereas 
microsequencing is used for genotyping known biallelic markers in diagnostic and association study 
applications. 

30 In one embodiment, the invention encompasses methods of genotyping comprising 

determining the identity of a nucleotide at a PG-3 -related biallelic marker or the complement 
thereof in a biological sample; optionally, the PG-3-related biallelic marker is selected from the 
group consisting of A 1 to A80, and the complements thereof, or optionally the biallelic markers in 
linkage disequilibrium therewith; optionally, wherein said PG-3-reIated biallelic marker is selected 

35 from the group consisting of Al to A5 and A8 to A80, and the complements thereof, or optionally 
the biallelic markers in linkage disequilibrium therewith; optionally, wherein said PG-3-related 
biallelic marker is selected from the group consisting of A6 and A7, and the complements thereof, 
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or optionally the biallelic markers in linkage disequilibrium therewith; optionally, the biological 
sample is derived from a single subject; optionally, the identity of the nucleotides at said biallelic 
marker is determined for both copies of said biallelic marker present in said individual's genome; 
optionally, said biological sample is derived from multiple subjects; Optionally, the genotyping 
5 methods of the invention encompass methods with any further limitation described in this 

disclosure, or those following, alone or in any combination; Optionally, said method is performed 
in vitro; optionally, the method further comprises amplifying a portion of said sequence comprising 
the biallelic marker prior to said determining step; Optionally, the amplification is performed by 
PGR, LCR, or replication of a recombinant vector comprising an origin of replication and said 
10 fragment in a host cell; optionally, the determination involves a hybridization assay, a sequencing 
assay, a microsequencing assay, or an enzyme-based mismatch detection assay. 
Source of Nucleic Acids for genotyping 

Any source of nucleic acids, in purified or non-purified form, can be utilized as the starting 
nucleic acid, provided it contains or is suspected of containing the specific nucleic acid sequence 

15 desired. DNA or RNA may be extracted from cells, tissues, body fluids and the like as described 
above. While nucleic acids for use in the genotyping methods of the invention can be derived from 
any mammalian source, the test subjects and individuals from which nucleic acid samples are taken 
are generally understood to be human. 

Amplification Of DNA Fragments Comprising Biallelic Markers 

20 Methods and polynucleotides are provided to amplify a segment of nucleotides comprising 

one or more biallelic marker of the present invention. It will be appreciated that amplification of 
DNA fragments comprising biallelic markers may be used in various methods and for various 
purposes and is not restricted to genotyping. Nevertheless, many genotyping methods, although not 
all, require the previous amplification of the DNA region carrying the biallelic marker of interest. 

25 Such methods specifically increase the concentration or total number of sequences that span the 
biallelic marker or include that site and sequences located either distal or proximal to it. Diagnostic 
assays may also rely on amplification of DNA segments carrying a biallelic marker of the present 
invention. Amplification of DNA may be achieved by any method known in the art. Amplification 
techniques are described above in the section entitled, "DNA amplification." 

30 Some of these amplification methods are particularly suited for the detection of single 

nucleotide polymorphisms and allow the simultaneous amplification of a target sequence and the 
identification of the polymorphic nucleotide as further described below. 

The identification of biallelic markers as described above allows the design of appropriate 
oligonucleotides, which can be used as primers to amplify DNA fragments comprising the biallelic 

35 markers of the present invention. Amplification can be performed using the primers initially used 
to discover new biallelic markers which are described herein or any set of primers allowing the 
amplification of a DNA fragment comprising a biallelic marker of the present invention. 
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In some embodiments, the present invention provides primers for amplifying a DNA 
fragment containing one or more biallelic markers of the present invention. Preferred amplification 
primers are listed in Example 2. It will be appreciated that the primers listed are merely exemplary 
and that any other set of primers which produce amplification products containing one or more 
5 biallelic markers of the present invention are also of use. 

The spacing of the primers determines the length of the segment to be amplified. In the 
context of the present invention, amplified segments carrying biallelic markers can range in size 
from at least about 25 bp to 35 kbp. Amplification fragments from 25-3000 bp are typical, 
fragments from 50-1000 bp are preferred and fragments from 100-600 bp are highly preferred. It 

10 will be appreciated that amplification primers for the biallelic markers may be any sequence which 
allow the specific amplification of any DNA fragment carrying the markers. Amplification primers 
may be labeled or immobilized on a solid support as described in the section "Oligonucleotide 
probes and primers". 

Methods of Genotyping DNA samples for Biallelic Markers 

15 Any method known in the art can be used to identify the nucleotide present at a biallelic 

marker site. Since the biallelic marker allele to be detected has been identified and specified in the 
present invention, detection will prove simple for one of ordinary skill in the art by employing any 
of a number of techniques. Many genotyping methods require the previous amplification of the 
DNA region carrying the biallelic marker of interest. While the amplification of target or signal is 

20 often preferred at present, ultrasensitive detection methods which do not require amplification are 
also encompassed by the present genotyping methods. Methods well-known to those skilled in the 
art that can be used to detect biallelic polymorphisms include methods such as, conventional dot 
blot analyzes, single strand conformational polymorphism analysis (SSCP) described by Orita et 
al (1989), denaturing gradient gel electrophoresis (DGGE), heteroduplex analysis, mismatch 

25 cleavage detection, and other conventional techniques as described in Sheffield et a/.(1991), White 
et al (1992), Grompe et al (1989 and 1993). Another method for determining the identity of the 
nucleotide present at a particular polymorphic site employs a specialized exonuclease-resistant 
nucleotide derivative as described in US patent 4,656,127. 

Preferred methods involve directly determining the identity of the nucleotide present at a 

30 biallelic marker site by sequencing assay, enzyme-based mismatch detection assay, or hybridization 
assay. The following is a description of some preferred methods. A highly preferred method is the 
microsequencing technique. The term "sequencing" is generally used herein to refer to polymerase 
extension of duplex primer/template complexes and includes both traditional sequencing and 
microsequencing. 

35 1) Sequencing Assays 

The nucleotide present at a polymorphic site can be determined by sequencing methods. In 
a preferred embodiment, DNA samples are subjected to PCR amplification before sequencing as 
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described above. DN A sequ ncing methods are described in th section entitled "Sequencing Of 
Amplified Genomic DNA And Identification Of Single Nucleotide Polymorphisms". 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
reactions using a dye-primer cycle sequencing protocol. Sequence analysis allows the identification 
5 of the base present at the biallelic marker site. 

2) Microsequencing Assays 

In microsequencing methods, the nucleotide at a polymorphic site in a target DNA is 
detected by a single nucleotide primer extension reaction. This method involves appropriate 
microsequencing primers which hybridize just upstream of the polymorphic base of interest in the 

10 target nucleic acid. A polymerase is used to specifically extend the 3 f end of the primer with one 
single ddNTP (chain terminator) complementary to the nucleotide at the polymorphic site. Next the 
identity of the incorporated nucleotide is determined in any suitable way. 

Typically, microsequencing reactions are carried out using fluorescent ddNTPs and the 
extended microsequencing primers are analyzed by electrophoresis on ABI 377 sequencing 

15 machines to determine the identity of the incorporated nucleotide as described in EP 412 883. 
Alternatively capillary electrophoresis can be used in order to process a higher number of assays 
simultaneously. An example of a typical microsequencing procedure that can be used in the context 
of the present invention is provided in Example 4. 

Different approaches can be used for the labeling and detection of ddNTPs. A 

20 homogeneous phase detection method based on fluorescence resonance energy transfer has been 
described by Chen and Kwok (1997) and Chen et a/.(1997). In this method, amplified genomic 
DNA fragments containing polymorphic sites are incubated with a 5'-fluorescein-labeled primer in 
the presence of allelic dye-labeled dideoxyribonucleoside triphosphates and a modified Taq 
polymerase. The dye-labeled primer is extended one base by the dye-terminator specific for the 

25 allele present on the template. At the end of the genotyping reaction, the fluorescence intensities of 
the two dyes in the reaction mixture are analyzed directly without separation or purification. All 
these steps can be performed in the same tube and the fluorescence changes can be monitored in 
real time. Alternatively, the extended primer may be analyzed by MALDI-TOF Mass 
Spectrometry. The base at the polymorphic site is identified by the mass added onto the 

30 microsequencing primer (see Haff and Smirnov, 1997). 

Microsequencing may be achieved by the established microsequencing method or by 
developments or derivatives thereof. Alternative methods include several solid-phase 
microsequencing techniques. The basic microsequencing protocol is the same as described 
previously, except that the method is conducted as a heterogeneous phase assay, in which the primer 

35 or the target molecule is immobilized or captured onto a solid support. To simplify the primer 
separation and the terminal nucleotide addition analysis, oligonucleotides are attached to solid 
supports or are modified in such ways that permit affinity separation as well as polymerase 
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extension. The 5* ends and internal nucleotides of synthetic oligonucleotides can be modified in a 
number of different ways to permit different affinity separation approaches, e.g., biotinylation. If a 
single affinity group is used on the oligonucleotides, the oligonucleotides can be separated from the 
incorporated terminator regent. This eliminates the need of physical or size separation. More than 
5 one oligonucleotide can be separated from the terminator reagent and analyzed simultaneously if 
more than one affinity group is used. This permits the analysis of several nucleic acid species or 
more nucleic acid sequence information per extension reaction. The affinity group need not be on 
the priming oligonucleotide but could alternatively be present on the template. For example, 
immobilization can be carried out via an interaction between biotinylated DNA and streptavidin- 

10 coated microtitration wells or avidin-coated polystyrene particles. In the same manner, 

oligonucleotides or templates may be attached to a solid support in a high-density format. In such 
solid phase microsequencing reactions, incorporated ddNTPs can be radiolabeled (Syvanen, 1994) 
or linked to fluorescein (Livak and Hainer, 1994). The detection of radiolabeled ddNTPs can be 
achieved through scintillation-based techniques. The detection of fluorescein-linked ddNTPs can 

15 be based on the binding of antifluorescein antibody conjugated with alkaline phosphatase, followed 
by incubation with a chromogenic substrate (such as /?-nitrophenyl phosphate). Other possible 
reporter-detection pairs include: ddNTP linked to dinitrophenyl (DNP) and anti-DNP alkaline 
phosphatase conjugate (Harju et al y 1993) or biotinylated ddNTP and horseradish peroxidase- 
conjugated streptavidin with o-phenylenediamine as a substrate (WO 92/15712). As yet another 

20 alternative solid-phase microsequencing procedure, Nyren et al. ( 1 993) described a method relying 
on the detection of DNA polymerase activity by an enzymatic luminometric inorganic 
pyrophosphate detection assay (ELIDA). 

Pastinen et al. (1997) describe a method for multiplex detection of single nucleotide 
polymorphism in which the solid phase mini sequencing principle is applied to an oligonucleotide 

25 array format. High-density arrays of DNA probes attached to a solid support (DNA chips) are 
further described below. 

In one aspect the present invention provides polynucleotides and methods to genotype one 
or more biallelic markers of the present invention by performing a microsequencing assay. 
Preferred microsequencing primers include the nucleotide sequences Dl to D4 and D6 to D80 and 

30 El to E4 and E6 to E80. It will be appreciated that the microsequencing primers listed in Example 
4 are merely exemplary and that any primer having a 3' end immediately adjacent to the 
polymorphic nucleotide may be used. Similarly, it will be appreciated that microsequencing 
analysis may be performed for any biallelic marker or any combination of biallelic markers of the 
present invention. One aspect of the present invention is a solid support which includes one or 

35 more microsequencing primers listed in Example 4, or fragments comprising at least 8, 12, 15, 20, 
25, 30, 40, or 50 consecutive nucleotides thereof, to the extent that such lengths are consistent with 
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the primer described, and having a 3' terminus immediately upstream of the corresponding biallelic 
marker, for determining the identity of a nucleotide at a biallelic marker site. 
3) Mismatch detection assays based on polymerases and ligases 
In one aspect the present invention provides polynucleotides and methods to determine the 
5 allele of one or more biallelic markers of the present invention in a biological sample, by mismatch 
detection assays based on polymerases and/or ligases. These assays are based on the specificity of 
polymerases and ligases. Polymerization reactions place particularly stringent requirements on 
correct base pairing of the 3' end of the amplification primer and the joining of two oligonucleotides 
hybridized to a target DNA sequence is quite sensitive to mismatches close to the ligation site, 
10 especially at the 3* end. Methods, primers and various parameters to amplify DNA fragments 
comprising biallelic markers of the present invention are further described above in the section 
entitled "Amplification Of DNA Fragments Comprising Biallelic Markers". 
Allele Specific Amplification Primers 

Discrimination between the two alleles of a biallelic marker can also be achieved by allele 
15 specific amplification, a selective strategy whereby one of the alleles is amplified without 

amplification of the other allele. For allele specific amplification, at least one member of the pair of 
primers is sufficiently complementary with a region of a PG-3 gene comprising the polymorphic 
base of a biallelic marker of the present invention to hybridize therewith and to initiate the 
amplification. Such primers are able to discriminate between the two alleles of a biallelic marker. 
20 This is accomplished by placing the polymorphic base at the 3' end of one of the 

amplification primers. Because the extension progresses from the 3 f end of the primer, a mismatch 
at or near this position has an inhibitory effect on amplification. Therefore, under appropriate 
amplification conditions, these primers only direct amplification on their complementary allele. 
Determining the precise location of the mismatch and the corresponding assay conditions are well 
25 within the ordinary skill in the art. 

Ligation/Amplification Based Methods 

The "Oligonucleotide Ligation Assay" (OLA) uses two oligonucleotides which are 
designed to be capable of hybridizing to abutting sequences of a single strand of a target molecules. 
One of the oligonucleotides is biotinylated, and the other is detectably labeled. If the precise 

30 complementary sequence is found in a target molecule, the oligonucleotides will hybridize such that 
their termini abut, and create a ligation substrate that can be captured and detected. OLA is capable 
of detecting single nucleotide polymorphisms and may be advantageously combined with PCR as 
described by Nickerson et a/.(1990). In this method, PCR is used to achieve the exponential 
amplification of target DNA, which is then detected using OLA. 

35 Other amplification methods which are particularly suited for the detection of single 

nucleotide polymorphism include LCR (ligase chain reaction), Gap LCR (GLCR) which are 
described above in the section entitled "DNA Amplification". LCR uses two pairs of probes to 
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exponentially amplify a specific target. The sequences of each pair of oligonucleotides are selected 
to permit (he pair to hybridize to abutting sequences of the same strand of the target. Such 
hybridization forms a substrate for a template-dependant ligase. In accordance with the present 
invention, LCR can be performed with oligonucleotides having the proximal and distal sequences of 
5 the same strand of a biallelic marker site. In one embodiment, either oligonucleotide will be 
designed to include the biallelic marker site. In such an embodiment, the reaction conditions are 
selected such that the oligonucleotides can be ligated together only if the target molecule either 
contains or lacks the specific nucleotide that is complementary to the biallelic marker on the 
oligonucleotide. In an alternative embodiment, the oligonucleotides will not include the biallelic 

10 marker, such that when they hybridize to the target molecule, a "gap" is created as described in WO 
90/01069. This gap is then "filled" with complementary dNTPs (as mediated by DNA polymerase), 
or by an additional pair of oligonucleotides. Thus at the end of each cycle, each single strand has a 
complement capable of serving as a target during the next cycle and exponential allele-specific 
amplification of the desired sequence is obtained. 

15 Ligase/Polymerase-mediated Genetic Bit Analysis™ is another method for determining the 

identity of a nucleotide at a preselected site in a nucleic acid molecule (WO 95/21271). This 
method involves the incorporation of a nucleoside triphosphate that is complementary to the 
nucleotide present at the preselected site onto the terminus of a primer molecule, and their 
subsequent ligation to a second oligonucleotide. The reaction is monitored by detecting a specific 

20 label attached to the reaction's solid phase or by detection in solution. 
4) Hybridization Assay Methods 

A preferred method of determining the identity of the nucleotide present at a biallelic 
marker site involves nucleic acid hybridization. The hybridization probes, which can be 
conveniently used in such reactions, preferably include the probes defined herein. Any 

25 hybridization assay may be used including Southern hybridization, Northern hybridization, dot blot 
hybridization and solid-phase hybridization (see Sambrook et aL, 1989). 

Hybridization refers to the formation of a duplex structure by two single stranded nucleic 
acids due to complementary base pairing. Hybridization can occur between exactly complementary 
nucleic acid strands or between nucleic acid strands that contain minor regions of mismatch. 

30 Specific probes can be designed that hybridize to one form of a biallelic marker and not to the other 
and therefore are able to discriminate between different allelic forms. Allele-specific probes are 
often used in pairs, one member of a pair showing perfect match to a target sequence containing the 
original allele and the other showing a perfect match to the target sequence containing the 
alternative allele. Hybridization conditions should be sufficiently stringent that there is a significant 

35 difference in hybridization intensity between alleles, and preferably an essentially binary response, 
whereby a probe hybridizes to only one of the alleles. Stringent, sequence specific hybridization 
c nditions, under which a probe will hybridize only to the exactly complementary target sequence 
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are well known in the art (Sambrook et al. f 1989). Stringent conditions are sequence dependent and 
will be different in different circumstances. Generally, stringent conditions are selected to be about 
5°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength 
and pH. Although such hybridization can be performed in solution, it is preferred to employ a 

5 solid-phase hybridization assay. The target DNA comprising a biallelic marker of the present 
invention may be amplified prior to the hybridization reaction. The presence of a specific allele in 
the sample is determined by detecting the presence or the absence of stable hybrid duplexes formed 
between the probe and the target DNA. The detection of hybrid duplexes can be carried out by a 
number of methods. Various detection assay formats are well known which utilize detectable labels 

10 bound to either the target or the probe to enable detection of the hybrid duplexes. Typically, 
hybridization duplexes are separated from unhybridized nucleic acids and the labels bound to the 
duplexes are then detected. Those skilled in the art will recognize that wash steps may be employed 
to wash away excess target DNA or probe as well as unbound conjugate. Further, standard 
heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the 

1 5 primers and probes. 

Two recently developed assays allow hybridization-based allele discrimination with no 
need for separations or washes (see Landegren U. et a!., 1998). The TaqMan assay takes advantage 
of the 5' nuclease activity of Taq DNA polymerase to digest a DNA probe annealed specifically to 
the accumulating amplification product. TaqMan probes are labeled with a donor-acceptor dye pair 

20 that interacts via fluorescence energy transfer. Cleavage of the TaqMan probe by the advancing 
polymerase during amplification dissociates the donor dye from the quenching acceptor dye, greatly 
increasing the donor fluorescence. All reagents necessary to detect two allelic variants can be 
assembled at the beginning of the reaction and the results are monitored in real time (see Livak et 
al. t 1995). In an alternative homogeneous hybridization based procedure, molecular beacons are 

25 used for allele discriminations. Molecular beacons are hairpin-shaped oligonucleotide probes that 
report the presence of specific nucleic acids in homogeneous solutions. When they bind to their 
targets they undergo a conformational reorganization that restores the fluorescence of an internally 
quenched fluorophore (Tyagi et al 9 1998). 

The polynucleotides provided herein can be used to produce probes which can be used in 

30 hybridization assays for the detection of biallelic marker alleles in biological samples. These probes 
preferably comprise between 8 and 50 nucleotides and are sufficiently complementary to a sequence 
comprising a biallelic marker of the present invention to hybridize thereto and preferably 
sufficiently specific to be able to discriminate the targeted sequence for only one nucleotide 
variation. A particularly preferred probe is 25 nucleotides in length. Preferably the biallelic marker 

35 is within 4 nucleotides of the center of the polynucleotide probe. In particularly preferred probes, 
the biallelic marker is at the center of said polynucleotide. Preferred probes comprise a nucleotide 
sequence selected from the group consisting of amplicons listed in Table 1 and the sequences 
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complementary thereto, or a fragment thereof, said fragment comprising at least about 8 consecutive 
nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 consecutive nucleotides and 
containing a polymorphic base. Preferred probes compris a nucleotide sequence selected from the 
group consisting of PI to P4 and P6 to P80 and the sequences complementary thereto. In preferred 
5 embodiments the polymorphic base(s) are within 5, 4, 3, 2, 1, nucleotides of the center of the said 
polynucleotide, more preferably at the center of said polynucleotide. 

Preferably the probes of the present invention are labeled or immobilized on a solid support. 
Labels and solid supports are further described in the section entitled "Oligonucleotide Probes and 
Primers". The probes can be non-extendable as described in the section entitled "Oligonucleotide 

10 Probes and Primers". 

By assaying the hybridization to an allele specific probe, one can detect the presence or 
absence of a biallelic marker allele in a given sample. High-Throughput parallel hybridization in 
array format is specifically encompassed within "hybridization assays" and is described below. 
5) Hybridization To Addressable Arrays Of Oligonucleotides 

15 Hybridization assays based on oligonucleotide arrays rely on the differences in 

hybridization stability of short oligonucleotides to perfectly matched and mismatched target 
sequence variants. Efficient access to polymorphism information is obtained through a basic 
structure comprising high-density arrays of oligonucleotide probes attached to a solid support (e.g., 
the chip) at selected positions. Each DNA chip can contain thousands to millions of individual 

20 synthetic DNA probes arranged in a grid-like pattern and miniaturized to the size of a dime. 

The chip technology has already been applied with success in numerous cases. For 
example, the screening of mutations has been undertaken in the BRCA1 gene, in S. cerevisiae 
mutant strains, and in the protease gene of HIV-1 virus (Hacia et ai 9 1996; Shoemaker et al. y 1996; 
Kozal et al, 1996). Chips of various formats for use in detecting biallelic polymorphisms can be 

25 produced on a customized basis by Affymetrix (GeneChip™), Hyseq (HyChip and HyGnostics), 
and Protogene Laboratories. 

In general, these methods employ arrays of oligonucleotide probes that are complementary 
to target nucleic acid sequence segments from an individual which, target sequences include a 
polymorphic marker. EP 785280, describes a tiling strategy for the detection of single nucleotide 

30 polymorphisms. Briefly, arrays may generally be "tiled" for a large number of specific 

polymorphisms. By "tiling" is generally meant the synthesis of a defined set of oligonucleotide 
probes which is made up of a sequence complementary to the target sequence of interest, as well as 
preselected variations of that sequence, e.g., substitution of one or more given positions with one or 
more members of the basis set of nucleotides. Tiling strategies are further described in PCT 

35 application No. WO 95/1 1995. In a particular aspect, arrays are tiled for a number of specific, 
identified biallelic marker sequences. In particular, the array is tiled to include a number of 
detection blocks, each detection block being specific for a specific biallelic marker or a set of 
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biailelic markers. For example, a detection block may be tiled to include a number of probes, which 
span the sequence segment that includes a specific polymorphism. To obtain probes that are 
complementary to each allele, the probes are synthesized in pairs differing at the biailelic marker. 
In addition to the probes differing at the polymorphic base, monosubstituted probes are also 
5 generally tiled within the detection block. These monosubstituted probes have bases at and up to a 
certain number of bases in either direction from the polymorphism, substituted with the remaining 
nucleotides (selected from A, T, G, C and U). Typically the probes in a tiled detection block will 
include substitutions of the sequence positions up to and including those that are 5 bases away from 
the biailelic marker. The monosubstituted probes provide internal controls for the tiled array, to 

10 distinguish actual hybridization from artefactual cross-hybridization. Upon completion of 

hybridization with the target sequence and washing of the array, the array is scanned to determine 
the position on the array to which the target sequence hybridizes. The hybridization data from the 
scanned array is then analyzed to identify which allele or alleles of the biailelic marker are present 
in the sample. Hybridization and scanning may be carried out as described in PCT application No. 

1 5 WO 92/10092 and WO 95/1 1 995 and US patent No. 5,424, 1 86. 

Thus, in some embodiments, the chips may comprise an array of nucleic acid sequences 
about 15 nucleotides in length. In further embodiments, the chip may comprise an array including 
at least one of the sequences selected from the group consisting of amplicons listed in Table 1 and 
the sequences complementary thereto, or a fragment thereof, said fragment comprising at least 

20 about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 
consecutive nucleotides and containing a polymorphic base. In preferred embodiments the 
polymorphic base is within 5, 4, 3, 2, 1 , nucleotides of the center of the said polynucleotide, more 
preferably at the center of said polynucleotide. In some embodiments, the chip may comprise an 
array of at least 2, 3, 4, 5, 6, 7, 8 or more of these polynucleotides of the invention. Solid supports 

25 and polynucleotides of the present invention attached to solid supports are further described in the 
section entitled "Oligonucleotide Probes And Primers". 
6) Integrated Systems 

Another technique, which may be used to analyze polymorphisms, includes 

multicomponent integrated systems, which miniaturize and compartmentalize processes such as 
30 PCR and capillary electrophoresis reactions in a single functional device. An example of such 

technique is disclosed in US patent 5,589,136, which describes the integration of PCR amplification 

and capillary electrophoresis in chips. 

Integrated systems can be envisaged mainly when microfluidic systems are used. These 

systems comprise a pattern of microchannels designed onto a glass, silicon, quartz, or plastic wafer 
35 included on a microchip. The movements of the samples are controlled by electric, electroosmotic 

or hydrostatic forces applied across different areas of the microchip to create functional microscopic 

valves and pumps with no moving parts. 
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For genotyping biallelic markers, the microfluidic system may integrate nucleic acid 
amplification, microsequencing, capillary electrophoresis and a detection method such as laser- 
induced fluorescence detection. 

METHODS OF GENETIC ANALYSIS USING THE BIALLELIC MARKERS OF 



Different methods are available for the genetic analysis of complex traits (see Lander and 
Schork, 1994). The search for disease-susceptibility genes is conducted using two main methods: 
the linkage approach in which evidence is sought for cosegregation between a locus and a putative 
trait locus using family studies, and the association approach in which evidence is sought for a 

10 statistically significant association between an allele and a trait or a trait causing allele (Khoury et 
al, 1993). In general, the biallelic markers of the present invention find use in any method known 
in the art to demonstrate a statistically significant correlation between a genotype and a phenotype. 
The biallelic markers may be used in parametric and non-parametric linkage analysis methods. 
Preferably, the biallelic markers of the present invention are used to identify genes associated with 

1 5 detectable traits using association studies, an approach which does not require the use of affected 
families and which permits the identification of genes associated with complex and sporadic traits. 

The genetic analysis using the biallelic markers of the present invention may be conducted 
on any scale. The whole set of biallelic markers of the present invention or any subset of biallelic 
markers of the present invention corresponding to the candidate gene may be used. Further, any set 

20 of genetic markers including a biallelic marker of the present invention may be used. A set of 
biallelic polymorphisms that could be used as genetic markers in combination with the biallelic 
markers of the present invention has been described in WO 98/20165. As mentioned above, it 
should be noted that the biallelic markers of the present invention may be included in any complete 
or partial genetic map of the human genome. These different uses are specifically contemplated in 

25 the present invention and claims. 
Linkage Analysis 

Linkage analysis is based upon establishing a correlation between the transmission of 
genetic markers and that of a specific trait throughout generations within a family. Thus, the aim of 
linkage analysis is to detect marker loci that show cosegregation with a trait of interest in pedigrees. 



When data are available from successive generations there is the opportunity to study the 
degree of linkage between pairs of loci. Estimates of the recombination fraction enable loci to be 
ordered and placed onto a genetic map. With loci that are genetic markers, a genetic map can be 
established, and then the strength of linkage between markers and traits can be calculated and used 
35 to indicate the relative positions of markers and genes affecting those traits (Weir, 1996). The 

classical method for linkage analysis is the logarithm of odds (lod) score method (see Morton, 1955; 
Ott, 1991). Calculation of lod scores requires specification of th mode of inheritance for the 
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disease (parametric method). Generally, the length of the candidate region identified using linkage 
analysis is between 2 and 20Mb. Once a candidate region is identified as described above, analysis 
of recombinant individuals using additional markers allows further delineation of the candidate 
region. Linkage analysis studies have generally relied on the use of a maximum of 5,000 
5 microsatellite markers, thus limiting the maximum theoretical attainable resolution of linkage 
analysis to about 600 kb on average. 

Linkage analysis has been successfully applied to map simple genetic traits that show clear 
Mendelian inheritance patterns and which have a high penetrance (i.e., the ratio between the number 
of trait positive carriers of allele a and the total number of a carriers in the population). However, 

10 parametric linkage analysis suffers from a variety of drawbacks. First, it is limited by its reliance on 
the choice of a genetic model suitable for each studied trait. Furthermore, as already mentioned, the 
resolution attainable using linkage analysis is limited, and complementary studies are required to 
refine the analysis of the typical 2Mb to 20Mb regions initially identified through linkage analysis. 
In addition, parametric linkage analysis approaches have proven difficult when applied to complex 

15 genetic traits, such as those due to the combined action of multiple genes and/or environmental 
factors. It is very difficult to model these factors adequately in a lod score analysis. In such cases, 
too large an effort and cost are needed to recruit the adequate number of affected families required 
for applying linkage analysis to these situations, as recently discussed by Risch, N. and Merikangas, 
K. (1996). 

20 NON-PARAMETRIC METHODS 

The advantage of the so-called non-parametric methods for linkage analysis is that they do 
not require specification of the mode of inheritance for the disease, they tend to be more useful for 
the analysis of complex traits. In non-parametric methods, one tries to prove that the inheritance 
pattern of a chromosomal region is not consistent with random Mendelian segregation by showing 

25 that affected relatives inherit identical copies of the region more often than expected by chance. 
Affected relatives should show excess "allele sharing" even in the presence of incomplete 
penetrance and polygenic inheritance. In non-parametric linkage analysis the degree of agreement 
at a marker locus in two individuals can be measured either by the number of alleles identical by 
state (IBS) or by the number of alleles identical by descent (IBD). Affected sib pair analysis is a 

30 well-known special case and is the simplest form of these methods. 

The biallelic markers of the present invention may be used in both parametric and non- 
parametric linkage analysis. Preferably biallelic markers may be used in non-parametric methods 
which allow the mapping of genes involved in complex traits. The biallelic markers of the present 
invention may be used in both IBD- and IBS- methods to map genes affecting a complex trait. In 

35 such studies, taking advantage of the high density of biallelic markers, several adjacent biallelic 
marker loci may be pooled to achieve the efficiency attained by multi-allelic markers (Zhao et al. 9 
1998). 
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Populati n Associati n Studies 

The present invention comprises methods for detecting an association between the PG-3 
gene and a detectable trait using the biallelic markers of the present invention. In one embodiment 
the present invention comprises methods to detect an association between a biallelic marker allele 
5 or a biallelic marker haplotype and a trait. Further, the invention comprises methods to identify a 
trait causing allele in linkage disequilibrium with any biallelic marker allele of the present 
invention. 

As described above, alternative approaches can be employed to perform association studies: 
genome-wide association studies, candidate region association studies and candidate gene 

10 association studies. In a preferred embodiment, the biallelic markers of the present invention are 
used to perform candidate gene association studies. The candidate gene analysis clearly provides a 
short-cut approach to the identification of genes and gene polymorphisms related to a particular trait 
when some information concerning the biology of the trait is available. Further, the biallelic 
markers of the present invention may be incorporated in any map of genetic markers of the human 

15 genome in order to perform genome-wide association studies. Methods to generate a high-density 
map of biallelic markers has been described in US Provisional Patent application serial number 
60/082,614. The biallelic markers of the present invention may further be incorporated in any map 
of a specific candidate region of the genome (a specific chromosome or a specific chromosomal 
segment for example). 

20 As mentioned above, association studies may be conducted within the general population 

and are not limited to studies performed on related individuals in affected families. Association 
studies are extremely valuable as they permit the analysis of sporadic or multifactor traits. 
Moreover, association studies represent a powerful method for fine-scale mapping enabling much 
finer mapping of trait causing alleles than linkage studies. Studies based on pedigrees often only 

25 narrow the location of the trait causing allele. Association studies using the biallelic markers of the 
present invention can therefore be used to refine the location of a trait causing allele in a candidate 
region identified by Linkage Analysis methods. Moreover, once a chromosome segment of interest 
has been identified, the presence of a candidate gene such as a candidate gene of the present 
invention, in the region of interest can provide a shortcut to the identification of the trait causing 

30 allele. Biallelic markers of the present invention can be used to demonstrate that a candidate gene is 
associated with a trait. Such uses are specifically contemplated in the present invention. 

Determining The Frequency Of A Biallelic Marker Allele Or Of A Biallelic Marker 
Haplotype In A Population 

Association studies explore the relationships among frequencies for sets of alleles between 

35 loci. 

DETERMINING THE FREQUENCY OF AN ALLELE IN A POPULATION 
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Allelic frequencies of the biallelic markers in a populations can be determined using one of 
the methods described above under the heading "Methods for genotyping an individual for biallelic 
markers", or any genotyping procedure suitable for this intended purpose. Genotyping pooled 
samples or individual samples can determine the frequency of a biallelic marker allele in a 
5 population. One way to reduce the number of genotypings required is to use pooled samples. A 
drawback in using pooled samples is in terms of accuracy and reproducibility for determining 
accurate DNA concentrations in setting up the pools. Genotyping individual samples provides 
higher sensitivity, reproducibility and accuracy and; is the preferred method used in the present 
invention. Preferably, each individual is genotyped separately and simple gene counting is applied 

10 to determine the frequency of an allele of a biallelic marker or of a genotype in a given population. 
The invention also relates to methods of estimating the frequency of an allele in a 
population comprising: a) genotyping individuals from said population for said biallelic marker 
according to the method of the present invention; b) determining the proportional representation of 
said biallelic marker in said population. In addition, the methods of estimating the frequency of an 

15 allele in a population of the invention encompass methods with any further limitation described in 
this disclosure, or those following, specified alone or in any combination; optionally, the PG-3- 
related biallelic marker is selected from the group consisting of Al to A80, and the complements 
thereof, or optionally the biallelic marker is one of the biallelic markers in linkage disequilibrium 
therewith; optionally, wherein said PG-3-related biallelic marker is selected from the group 

20 consisting of A 1 to A5 and A8 to A80, and the complements thereof, or optionally the biallelic 
markers in linkage disequilibrium therewith; optionally, wherein said PG-3-related biallelic marker 
is selected from the group consisting of A6 and A7, and the complements thereof, or optionally the 
biallelic markers in linkage disequilibrium therewith; Optionally, the determination of the frequency 
of a biallelic marker allele in a population may be accomplished by determining the identity of the 

25 nucleotides for both copies of said biallelic marker present in the genome of each individual in said 
population and calculating the proportional representation of said nucleotide at said PG-3-related 
biallelic marker for the population; Optionally, the determination of the proportional representation 
may be accomplished by performing a genotyping method of the invention on a pooled biological 
sample derived from a representative number of individuals, or each individual, in said population, 

30 and calculating the proportional amount of said nucleotide compared with the total. 

DETERMINING THE FREQUENCY OF A HAPLOTYPE IN A POPULATION 
The gametic phase of haplotypes is unknown when diploid individuals are heterozygous at 
more than one locus. Using genealogical information in families gametic phase can sometimes be 
inferred (Perlin et ai, 1994). When no genealogical information is available different strategies 

35 may be used. One possibility is that the multiple-site heterozygous diploids can be eliminated from 
the analysis, keeping only the homozygotes and the single-site heterozygote individuals, but this 
approach might lead to a possible bias in the sample composition and the underestimation of low- 
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frequency haplotypes. Another possibility is that single chromosomes can be studied 
independently, for example, by asymmetric PCR amplification (see Newton et al, 1989; Wu et al, 
1989) or by isolation of single chromosome by limit dilution followed by PCR amplification (see 
Ruano et al, 1990). Further, a sample may be haplotyped for sufficiently close biallelic markers by 
5 double PCR amplification of specific alleles (Sarkar, G. andSommerS. S., 1991). These 
approaches are not entirely satisfying either because of their technical complexity, the additional 
cost they entail, their lack of generalization at a large scale, or the possible biases they introduce. 
To overcome these difficulties, an algorithm to infer the phase of PCR-amplified DNA genotypes 
introduced by Clark, A.G.(1990) may be used. Briefly, the principle is to start filling a preliminary 

10 list of haplotypes present in the sample by examining unambiguous individuals, that is, the 
complete homozygotes and the single-site heterozygotes. Then other individuals in the same 
sample are screened for the possible occurrence of previously recognized haplotypes. For each 
positive identification, the complementary haplotype is added to the list of recognized haplotypes, 
until the phase information for all individuals is either resolved or identified as unresolved. This 

15 method assigns a single haplotype to each multiheterozygous individual, whereas several 

haplotypes are possible when there are more than one heterozygous site. Alternatively, one can use 
methods estimating haplotype frequencies in a population without assigning haplotypes to each 
individual. Preferably, a method based on an expectation-maximization (EM) algorithm (Dempster 
et al, 1977) leading to maximum-likelihood estimates of haplotype frequencies under the 

20 assumption of Hardy-Weinberg proportions (random mating) is used (see Excoffier L. and Slatkin 
M, 1995). The EM algorithm is a generalized iterative maximum-likelihood approach to 
estimation that is useful when data are ambiguous and/or incomplete. The EM algorithm is used to 
resolve heterozygotes into haplotypes. Haplotype estimations are further described below under the 
heading "Statistical Methods." Any other method known in the art to determine or to estimate the 

25 frequency of a haplotype in a population may be used. 

The invention also encompasses methods of estimating the frequency of a haplotype for a 
set of biallelic markers in a population, comprising the steps of: a) genotyping at least one PG-3- 
related biallelic marker according to a method of the invention for each individual in said 
population; b) genotyping a second biallelic marker by determining the identity of the nucleotides at 

30 said second biallelic marker for both copies of said second biallelic marker present in the genome of 
each individual in said population; and c) applying a haplotype determination method to the 
identities of the nucleotides determined in steps a) and b) to obtain an estimate of said frequency. In 
addition, the methods of estimating the frequency of a haplotype of the invention encompass 
methods with any further limitation described in this disclosure, or those following, alone or in any 

35 combination: optionally, said PG-3-related biallelic marker is selected from the gr up consisting of 
Al to A80, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, wherein said PG-3-related biallelic marker is selected from the 
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group consisting of Al to A5 and A8 to A80, and the complements thereof, or optionally the 
biallelic markers in linkage disequilibrium therewith; optionally, wherein said PG-3-related biallelic 
marker is selected from the group consisting of A6 and A7, and the complements thereof, or 
optionally the biallelic markers in linkage disequilibrium therewith; Optionally, said haplotype 
5 determination method is performed by asymmetric PCR amplification, double PCR amplification of 
specific alleles, the Clark algorithm, or an expectation-maximization algorithm. 
Linkage Disequilibrium Analysis 

Linkage disequilibrium is the non-random association of alleles at two or more loci and 
represents a powerful tool for mapping genes involved in disease traits (see Ajioka R.S. et a/., 

10 1997). Biallelic markers, because they are densely spaced in the human genome and can be 
genotyped in greater numbers than other types of genetic markers (such as RFLP or VNTR 
markers), are particularly useful in genetic analysis based on linkage disequilibrium. 

When a disease mutation is first introduced into a population (by a new mutation or the 
immigration of a mutation carrier), it necessarily resides on a single chromosome and thus on a 

15 single "background" or "ancestral" haplotype of linked markers. Consequently, there is complete 
disequilibrium between these markers and the disease mutation: one finds the disease mutation only 
in the presence of a specific set of marker alleles. Through subsequent generations recombination 
events occur between the disease mutation and these marker polymorphisms, and the disequilibrium 
gradually dissipates. The pace of this dissipation is a function of the recombination frequency, so 

20 the markers closest to the disease gene will manifest higher levels of disequilibrium than those that 
are further away. When not broken up by recombination, "ancestral" haplotypes and linkage 
disequilibrium between marker alleles at different loci can be tracked not only through pedigrees 
but also through populations. Linkage disequilibrium is usually seen as an association between one 
specific allele at one locus and another specific allele at a second locus. 

25 The pattern or curve of disequilibrium between disease and marker loci is expected to 

exhibit a maximum that occurs at the disease locus. Consequently, the amount of linkage 
disequilibrium between a disease allele and closely linked genetic markers may yield valuable 
information regarding the location of the disease gene. For fine-scale mapping of a disease locus, it 
is useful to have some knowledge of the patterns of linkage disequilibrium that exist between 

30 markers in the studied region. As mentioned above the mapping resolution achieved through the 
analysis of linkage disequilibrium is much higher than that of linkage studies. The high density of 
biallelic markers combined with linkage disequilibrium analysis provides powerful tools for fine- 
scale mapping. Different methods to calculate linkage disequilibrium are described below under the 
heading "Statistical Methods". 

35 Population-Based Case-Control Studies Of Trait-Marker Associations 

As mentioned above, the occurrence of pairs of specific alleles at different loci on the same 
chromosome is not random and the deviation from random is called linkage disequilibrium. 
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Association studies focus on population frequencies and rely on the phenomenon of linkage 
disequilibrium. If a specific allele in a given gene is directly involved in causing a particular trait, 
its frequency will be statistically increased in an affected (trait positive) population, when compared 
to the frequency in a trait negative population or in a random control population. As a consequence 
5 of the existence of linkage disequilibrium, the frequency of all other alleles present in the haplotype 
carrying the trait-causing allele will also be increased in trait positive individuals compared to trait 
negative individuals or random controls. Therefore, association between the trait and any allele 
(specifically a biallelic marker allele) in linkage disequilibrium with the trait-causing allele will 
suffice to suggest the presence of a trait-related gene in that particular region. Case-control 

10 populations can be genotyped for biallelic markers to identify associations that narrowly locate a 
trait causing allele. As any marker in linkage disequilibrium with one given marker associated with 
a trait will be associated with the trait. Linkage disequilibrium allows the relative frequencies in 
case-control populations of a limited number of genetic polymorphisms (specifically biallelic 
markers) to be analyzed as an alternative to screening all possible functional polymorphisms in 

15 order to find trait-causing alleles. Association studies compare the frequency of marker alleles in 
unrelated case-control populations, and represent powerful tools for the dissection of complex traits. 
CASE-CONTROL POPULATIONS (INCLUSION CRITERIA) 
Population-based association studies do not concern familial inheritance but compare the 
prevalence of a particular genetic marker, or a set of markers, in case-control populations. They are 

20 case-control studies based on comparison of unrelated case (affected or trait positive) individuals 
and unrelated control (unaffected, trait negative or random) individuals. Preferably the control 
group is composed of unaffected or trait negative individuals. Further, the control group is 
ethnically matched to the case population. Moreover, the control group is preferably matched to the 
case-population for the main known confusion factor for the trait under study (for example age- 

25 matched for an age-dependent trait). Ideally, individuals in the two samples are paired in such a 
way that they are expected to differ only in their disease status. The terms 'trait positive 
population", "case population" and "affected population" are used interchangeably herein. 

An important step in the dissection of complex traits using association studies is the choice 
of case-control populations (see Lander and Schork, 1994). A major step in the choice of case- 

30 control populations is the clinical definition of a given trait or phenotype. Any genetic trait may be 
analyzed by the association method proposed here by carefully selecting the individuals to be 
included in the trait positive and trait negative phenotypic groups. Four criteria are often useful: 
clinical phenotype, age at onset, family history and severity. The selection procedure for 
continuous or quantitative traits (such as blood pressure for example) involves selecting individuals 

35 at opposite ends of the phenotype distribution of the trait under study, so as to include in these trait 
positive and trait negative populations individuals with non-overlapping phenotypes. Preferably, 
case-control populations consist of phenotypically homogeneous populations. Trait positive and 
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More particularly, the present invention relates to expression vectors which include nucleic 
acids encoding a PG-3 protein, preferably the PG-3 protein of the amino acid sequence of SEQ ID 
No 3 or variants or fragments thereof. 

The invention also pertains to a recombinant expression vector useful for the expression of 
5 the PG-3 coding sequence, wherein said vector comprises a nucleic acid of SEQ ID No 2. 

Recombinant vectors comprising a nucleic acid containing a PG-3-related biallelic marker 
are also part of the invention. In a preferred embodiment, said biallelic marker is selected from the 
group consisting of A 1 to A80, and the complements thereof. 

Some of the elements which can be found in the vectors of the present invention are 
10 described in further detail in the following sections. 

The present invention also encompasses primary, secondary, and immortalized 
homologously recombinant host cells of vertebrate origin, preferably mammalian origin and 
particularly human origin, that have been engineered to: a) insert exogenous (heterologous) 
polynucleotides into the endogenous chromosomal DNA of a targeted gene, b) delete endogenous 
1 5 chromosomal DNA, and/or c) replace endogenous chromosomal DNA with exogenous 

polynucleotides. Insertions, deletions, and/or replacements of polynucleotide sequences may be to 
the coding sequences of the targeted gene and/or to regulatory regions, such as promoter and 
enhancer sequences, operably associated with the targeted gene. 

The present invention further relates to a method of making a homologously recombinant 
20 host cell in vitro or in vivo, wherein the expression of a targeted gene not normally expressed in the 
cell is altered. Preferably the alteration causes expression of the targeted gene under normal growth 
conditions or under conditions suitable for producing the polypeptide encoded by the targeted gene. 
The method comprises the steps of: (a) transfecting the cell in vitro or in vivo with a polynucleotide 
construct, the polynucleotide construct comprising; (i) a targeting sequence; (ii) a regulatory 
25 sequence and/or a coding sequence; and (iii) an unpaired splice donor site, if necessary, thereby 
producing a transfected cell; and (b) maintaining the transfected cell in vitro or in vivo under 
conditions appropriate for homologous recombination. 

The present invention further relates to a method of altering the expression of a targeted 
gene in a cell in vitro or in vivo wherein the gene is not normally expressed in the cell, comprising 
30 the steps of: (a) transfecting the cell in vitro or in vivo with a a polynucleotide construct, the a 
polynucleotide construct comprising: (i) a targeting sequence; (ii) a regulatory sequence and/or a 
coding sequence; and (iii) an unpaired splice donor site, if necessary, thereby producing a 
transfected cell; and (b) maintaining the transfected cell in vitro or in vivo under conditions 
appropriate for homologous recombination, thereby producing a homologously recombinant cell; 
35 and (c) maintaining the homologously recombinant cell in vitro or in vivo under conditions 
appropriate for expression of the gene. 
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trait negative populations consist of phenotypically uniform populations of individuals representing 
each between 1 and 98%, preferably between 1 and 80%, more preferably between 1 and 50%, and 
more preferably between 1 and 30%, most preferably between 1 and 20% of the total population 
under study, and preferably selected among individuals exhibiting non-overlapping phenotypes. 
5 The clearer the difference between the two trait phenotypes, the greater the probability of detecting 
an association with biallelic markers. The selection of those drastically different but relatively 
uniform phenotypes enables efficient comparisons in association studies and the possible detection 
of marked differences at the genetic level, provided that the sample sizes of the populations under 
study are significant enough. 
10 In preferred embodiments, a first group of between 50 and 300 trait positive individuals, 

preferably about 100 individuals, are recruited according to their phenotypes. A similar number of 
control individuals are included in such studies. 
ASSOCIATION ANALYSIS 

The invention also comprises methods of detecting an association between a genotype and a 

15 phenotype, comprising the steps of: a) determining the frequency of at least one PG-3-related 
biallelic marker in a trait positive population according to a genotyping method of the invention; b) 
determining the frequency of said PG-3-related biallelic marker in a control population according to 
a genotyping method of the invention; and c) determining whether a statistically significant 
association exists between said genotype and said phenotype. In addition, the methods of detecting 

20 an association between a genotype and a phenotype of the invention encompass methods with any 
further limitation described in this disclosure, or those following, specified alone or in any 
combination: optionally, wherein said PG-3-related biallelic marker is selected from the group 
consisting of Al to A80, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, wherein said PG-3-related biallelic marker is selected from the 

25 group consisting of A 1 to A5 and A8 to A80, and the complements thereof, or optionally the 

biallelic markers in linkage disequilibrium therewith; optionally, wherein said PG-3-related biallelic 
marker is selected from the group consisting of A6 and A7, and the complements thereof, or 
optionally the biallelic markers in linkage disequilibrium therewith; Optionally, said control 
population may be a trait negative population, or a random population; Optionally, each of said 

30 genotyping steps a) and b) may be performed on a pooled biological sample derived from each of 
said populations; Optionally, each of said genotyping of steps a) and b) is performed separately on 
biological samples derived from each individual in said population or a subsample thereof; 
Optionally, said trait is cancer susceptibility. 

The general strategy to perform association studies using biallelic markers derived from a 

35 region carrying a candidat gene is to scan two groups of individuals (case-control populations) in 
order to measure and statistically compare the allele frequencies of the biallelic markers of the 
present invention in both groups. 
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If a statistically significant association with a trait is identified for at least one or more of 
the analyzed biallelic markers, one can assume that: either the associated allele is directly 
responsible for causing the trait (i.e. the associated allele is the trait causing allele), or more likely 
the associated allele is in linkage disequilibrium with the trait causing allele. The specific 
5 characteristics of the associated allele with respect to the candidate gene function usually give 
further insight into the relationship between the associated allele and the trait (causal or in linkage 
disequilibrium). If the evidence indicates that the associated allele within the candidate gene is 
most probably not the trait causing allele but is in linkage disequilibrium with the real trait causing 
allele, then the trait causing allele can be found by sequencing the vicinity of the associated marker, 
10 and performing further association studies with the polymorphisms that are revealed in an iterative 
manner. 

Association studies are usually run in two successive steps. In a first phase, the frequencies 
of a reduced number of biallelic markers from the candidate gene are determined in the trait positive 
and control populations. In a second phase of the analysis, the position of the genetic loci 
1 5 responsible for the given trait is further refined using a higher density of markers from the relevant 
region. However, if the candidate gene under study is relatively small in length, as is the case for 
PG-3, a single phase may be sufficient to establish significant associations. 

HAPLOTYPE ANALYSIS 

As described above, when a chromosome carrying a disease allele first appears in a 
20 population as a result of either mutation or migration, the mutant allele necessarily resides on a 
chromosome having a set of linked markers: the ancestral haplotype. This haplotype can be tracked 
through populations and its statistical association with a given trait can be analyzed. 
Complementing single point (allelic) association studies with multi-point association studies also 
called haplotype studies increases the statistical power of association studies. Thus, a haplotype 
25 association study allows one to define the frequency and the type of the ancestral carrier haplotype. 
A haplotype analysis is important in that it increases the statistical power of an analysis involving 
individual markers. 

In a first stage of a haplotype frequency analysis, the frequency of the possible haplotypes 
based on various combinations of the identified biallelic markers of the invention is determined. 

30 The haplotype frequency is then compared for distinct populations of trait positive and control 
individuals. The number of trait positive individuals, which should be, subjected to this analysis to 
obtain statistically significant results usually ranges between 30 and 300, with a preferred number of 
individuals ranging between 50 and 150. The same considerations apply to the number of 
unaffected individuals (or random control) used in the study. The results of this first analysis 

35 provide haplotype frequencies in case-control populations, for each evaluated haplotype frequency a 
p-value and an odd ratio are calculated. If a statistically significant association is found the relative 
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risk for an individual carrying the given haplotype of being affected with the trait under study can 
be approximated. 

An additional embodiment of the present invention encompasses methods of detecting an 
association between a haplotype and a phenotype, comprising the steps of: a) estimating the 

5 frequency of at least one haplotype in a trait positive population, according to a method of the 
invention for estimating the frequency of a haplotype; b) estimating the frequency of said haplotype 
in a control population, according to a method of the invention for estimating the frequency of a 
haplotype; and c) determining whether a statistically significant association exists between said 
haplotype and said phenotype. In addition, the methods of detecting an association between a 

10 haplotype and a phenotype of the invention encompass methods with any further limitation 
described in this disclosure, or those following: optionally, said PG-3-related biallelic marker is 
selected from the group consisting of Al to A80, and the complements thereof, or optionally the 
biallelic markers in linkage disequilibrium therewith; optionally, wherein said PG-3-related biallelic 
marker is selected from the group consisting of Al to A5 and A8 to A80, and the complements 

15 thereof, or optionally the biallelic markers in linkage disequilibrium therewith; optionally, wherein 
said PG-3-related biallelic marker is selected from the group consisting of A6 and A7, and the 
complements thereof, or optionally the biallelic markers in linkage disequilibrium therewith; 
Optionally, said control population is a trait negative population, or a random population. 
Optionally, said method comprises the additional steps of determining the phenotype in said trait 

20 positive and said control populations prior to step c); optionally, said trait is cancer susceptibility. 
INTERACTION ANALYSIS 
The biallelic markers of the present invention may also be used to identify patterns of 
biallelic markers associated with detectable traits resulting from polygenic interactions. The 
analysis of genetic interaction between alleles at unlinked loci requires individual genotyping using 

25 the techniques described herein. The analysis of allelic interaction among a selected set of biallelic 
markers with an appropriate level of statistical significance can be considered as a haplotype 
analysis. Interaction analysis consists in stratifying the case-control populations with respect to a 
given haplotype for the first loci and performing a haplotype analysis with the second loci with each 
subpopulation. 

30 Statistical methods used in association studies are further described below. 

Testing For Linkage In The Presence Of Association 
The biallelic markers of the present invention may further be used in TDT 
(transmission/disequilibrium test). TDT tests for both linkage and association and is not affected by 
population stratification. TDT requires data for affected individuals and their parents or data from 
35 unaffected sibs instead of from parents (see Spielmann S. et al, 1993; Schaid D.J. et al, 1996, 
Spielmann S. and Ewens W.J., 1998). Such combined tests generally reduce the false - positive 
errors produced by separate analyses. 



WO 01/14550 



PCT/IB00/01098 



74 

STATISTICAL METHODS 

In general, any method known in the art to test whether a trait and a genotype show a 
statistically significant correlation may be used. 

1) Methods In Linkage Analysis 

5 Statistical methods and computer programs useful for linkage analysis are well-known to 

those skilled in the art (see Terwilliger J.D. and Ott J., 1994; Ott J., 1991). 

2) Methods To Estimate Haplotype Frequencies In A Population 

As described above, when genotypes are scored, it is often not possible to distinguish 
heterozygotes so that haplotype frequencies cannot be easily inferred. When the gametic phase is 

10 not known, haplotype frequencies can be estimated from the multilocus genotypic data. Any 

method known to person skilled in the art can be used to estimate haplotype frequencies (see Lange 
K., 1997; Weir, B.S., 1996) Preferably, maximum-likelihood haplotype frequencies are computed 
using an Expectation- Maximization (EM) algorithm (see Dempster et aL, 1977; Excoffier L. and 
Slatkin M., 1995). This procedure is an iterative process aiming at obtaining maximum-likelihood 

15 estimates of haplotype frequencies from multi-locus genotype data when the gametic phase is 
unknown. Haplotype estimations are usually performed by applying the EM algorithm using for 
example the EM-HAPLO program (Hawley M. E. et aL, 1994) or the Arlequin program 
(Schneider et aL, 1997). The EM algorithm is a generalized iterative maximum likelihood approach 
to estimation and is briefly described below. 

20 Please note that in the present section, "Methods To Estimate Haplotype Frequencies In A 

Population, ", phenotypes will refer to multi-locus genotypes with unknown haplotypic phase. 
Genotypes will refer to mutli-locus genotypes with known haplotypic phase. 

Suppose one has a sample of N unrelated individuals typed for K markers. The data 
observed are the unknown-phase K-locus phenotypes that can be categorized with F different 

25 phenotypes. Further, suppose that we have H possible haplotypes (in the case of K biallelic markers, 

we have for the maximum number of possible haplotypes H=2*). 



Here, P } is the probability of the f 1 phenotype, and P(h h h$ is the probability of the /* 
30 genotype composed of haplotypes h k and h t . Under random mating (i.e. Hardy-Weinberg 
Equilibrium), PQtkhi) is expressed as: 

P{h k ,h l ) = P(h k ) 2 for/r^.and 



35 estimated from a set of initial values of haplotype frequencies. These haplotype frequencies are 



For phenotype j with cj possible genotypes, we have: 




Equation 1 



P(h k y h t ) = 2P(h k )P(h t ) for h k * h r Equation 2 

The E-M algorithm is composed of the following steps: First, the genotype frequencies are 
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denoted P/°\ Pf\ Pf*,..., Pf/ 0) . The initial values for the haplotype frequencies may be obtained 
from a random number generator or in some other way well known in the art. This step is referred 
to the Expectation step. The next step in the method, called the Maximization step, consists of using 
the estimates for the genotype frequencies to re-calculate the haplotype frequencies. The first 
5 iteration haplotype frequency estimates are denoted by P, (, \ P 2 °\ Pj 0 ** - - » Ph (S) * In general, the 
Expectation step at the 5 th iteration consists of calculating the probability of placing each phenotype 
into the different possible genotypes based on the haplotype frequencies of the previous iteration: 

where rtj is the number of individuals with the f 1 phenotype and Pj (h k , h t ) {s) is the 

10 probability of genotype h^hi in phenotype j. In the Maximization step, which is equivalent to the 
gene-counting method (Smith, 1957), the haplotype frequencies are re-estimated based on the 
genotype estimates: 

P,^ 4£ £<WM,) (,) ■ Equation 

Here, & lt is an indicator variable which counts the number of occurrences that haplotype t is 
15 present in i* genotype; it takes on values 0, 1, and 2. 

The E-M iterations cease when the following criterion has been reached. Using Maximum 
Likelihood Estimation (MLE) theory, one assumes that the phenotypes j are distributed 
multinomially. At each iteration s, one can compute the likelihood function L. Convergence is 
achieved when the difference of the log-likehood between two consecutive iterations is less than 
20 some small number, preferably 10" 7 . 

3) Methods To Calculate Linkage Disequilibrium Between Markers 
A number of methods can be used to calculate linkage disequilibrium between any two 
genetic positions, in practice linkage disequilibrium is measured by applying a statistical association 
test to haplotype data taken from a population. 
25 Linkage disequilibrium between any pair of biallelic markers comprising at least one of the 

biallelic markers of the present invention (Mi, Mj) having alleles (a/bi) at marker Mi and alleles 
(a/bj) at marker Mj can be calculated for every allele combination (aj,aj ; aj,bj ; bi,aj andbj,bj), 
according to the Piazza formula: 

A aiaj = V94 - V (94 + 93) (94 +92), where: 
30 94= - - = frequency of genotypes not having allele aj at Mi and not having allele aj at Mj 

93= - + = frequency of genotypes not having allele a; at Mj and having allele aj at Mj 
92= + - = frequency of genotypes having allele aj at Mj and not having allele aj at Mj 
Linkage disequilibrium (LD) between pairs of biallelic markers (Mj, Mj) can also be 
calculated for every allele combination (ai,aj ; ai,bj ; b i5 aj andb^bj), according to the maximum- 



Equation 3 
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likelihood estimate (MLE) for delta (the composite genotypic disequilibrium coefficient), as 
described by Weir (Weir B. S., 1996). The MLE for the composite linkage disequilibrium is: 
D aiaj = (2ni + n 2 + n 3 + nJ2)N - 2(pr(ai). pr(aj)) 

Where n } - E phenotype (a/aj, aj/aj), n 2 = I phenotype (a^a,-, a/bj), n 3 = Z phenotype (a/bi, 
5 a/aj), n4= I phenotype (a/bj, a/bj) and N is the number of individuals in the sample. 

This formula allows linkage disequilibrium between alleles to be estimated when only 
genotype, and not haplotype, data are available. 

Another means of calculating the linkage disequilibrium between markers is as follows. 
For a couple of biallelic markers, M t (a A) and Mj (a/bj), fitting the Hardy- Weinberg equilibrium, 
10 one can estimate the four possible haplotype frequencies in a given population according to the 
approach described above. 

The estimation of gametic disequilibrium between ai and aj is simply: 
D aiaj = pr(haplotype(a h a ;))- pr{ai).pr{a j). 

Where pr(a$ is the probability of allele a, and pr(o^ is the probability of allele ay and where 
15 pr(haplotype (a^ a)) is estimated as in Equation 3 above. 

For a couple of biallelic marker only one measure of disequilibrium is necessary to describe 
the association between Af, and Mj. 

Then a normalized value of the above is calculated as follows: 

D'aia, = D aiaj / max (-pr(ai). pr(aj) , -pr(bi). pr(bj)) with D aiaj <0 
20 D' aiaj = Daiaj / max (pr(bi). pr(aj) , pr(aj). pr(bj)) with D aiaj >0 

The skilled person will readily appreciate that other linkage disequilibrium calculation 
methods can be used. 

Linkage disequilibrium among a set of biallelic markers having an adequate heterozygosity 
rate can be determined by genotyping between 50 and 1000 unrelated individuals, preferably 
25 between 75 and 200, more preferably around 100. 
4) Testing For Association 

Methods for determining the statistical significance of a correlation between a phenotype 
and a genotype, in this case an allele at a biallelic marker or a haplotype made up of such alleles, 
may be determined by any statistical test known in the art and with any accepted threshold of 

30 statistical significance being required. The application of particular methods and thresholds of 
significance are well with in the skill of the ordinary practitioner of the art. 

Testing for association is performed by determining the frequency of a biallelic marker 
allele in case and control populations and comparing these frequencies with a statistical test to 
determine if their is a statistically significant difference in frequency which would indicate a 

35 correlation between the trait and the biallelic marker allele under study. Similarly, a haplotype 
analysis is performed by estimating the frequencies of all possible haplotypes for a given set of 
biallelic markers in case and control populations, and comparing these frequencies with a statistical 
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test to determine if their is a statistically significant correlation between the haplotype and the 
phenotype (trait) under study. Any statistical tool useful to test for a statistically significant 
association between a genotype and a phenotype may be used. Preferably the statistical test 
employed is a chi-square test with one degree of freedom. A P-value is calculated (the P-value is 
5 the probability that a statistic as large or larger than the observed one would occur by chance). 
STATISTICAL SIGNIFICANCE 

In preferred embodiments, significance for diagnosis purposes, either as a positive basis for 
further diagnostic tests or as a preliminary starting point for early preventive therapy, the p value 
related to a biallelic marker association is preferably about 1 x 10" 2 or less, more preferably about 1 
10 x 10 -4 or less, for a single biallelic marker analysis and about 1 x 10* 3 or less, still more preferably 1 
x 10* 6 or less and most preferably of about 1 x 10" 8 or less, for a haplotype analysis involving two or 
more markers. These values are believed to be applicable to any association studies involving 
single or multiple marker combinations. 

The skilled person can use the range of values set forth above as a starting point in order to 
15 carry out association studies with biallelic markers of the present invention. In doing so, significant 
associations between the biallelic markers of the present invention and a trait can be revealed and 
used for diagnosis and drug screening purposes. 
PHENOTYPIC PERMUTATION 

In order to confirm the statistical significance of the first stage haplotype analysis described 
20 above, it might be suitable to perform further analyses in which genotyping data from case-control 
individuals are pooled and randomized with respect to the trait phenotype. Each individual 
genotyping data is randomly allocated to two groups, which contain the same number of individuals 
as the case-control populations used to compile the data obtained in the first stage. A second stage 
haplotype analysis is preferably run on these artificial groups, preferably for the markers included in 
25 the haplotype of the first stage analysis showing the highest relative risk coefficient. This 

experiment is reiterated preferably at least between 100 and 10000 times. The repeated iterations 
allow the determination of the probability to obtain the tested haplotype by chance. 
ASSESSMENT OF STATISTICAL ASSOCIATION 

To address the problem of false positives similar analysis may be performed with the same 
30 case-control populations in random genomic regions. Results in random regions and the candidate 
region are compared as described in a co-pending US Provisional Patent Application entitled 
"Methods, Software And Apparati For Identifying Genomic Regions Harboring A Gene Associated 
With A Detectable Trait," U.S. Serial Number 60/107,986, filed November 10, 1998, and a second 
U.S. Provisional Patent Application also entitled "Methods, Software And Apparati For Identifying 
35 Genomic Regions Harboring A Gene Associated With A Detectable Trait," U.S. Serial Number 
60/140,785, filed June 23, 1999. 

5) Evaluation Of Risk Factors 
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The association between a risk factor (in genetic epidemiology the risk factor is the 
presence or the absence of a certain allele or haplotype at marker loci) and a disease is measured by 
the odds ratio (OR) and by the relative risk (RR). If P(R*) is the probability of developing the 
disease for individuals with R and P(R ) is the probability for individuals without the risk factor, 
5 then the relative risk is simply the ratio of the two probabilities, that is: 



In case-control studies, direct measures of the relative risk cannot be obtained because of 
the sampling design. However, the odds ratio allows a good approximation of the relative risk for 
low-incidence diseases and can be calculated: 



F* is the frequency of the exposure to the risk factor in cases and F is the frequency of the 
exposure to the risk factor in controls. F* and F are calculated using the allelic or haplotype 
frequencies of the study and further depend on the underlying genetic model (dominant, recessive, 
additive...). 

One can further estimate the attributable risk (AR) which describes the proportion of 
individuals in a population exhibiting a trait due to a given risk factor. This measure is important in 
quantifying the role of a specific factor in disease etiology and in terms of the public health impact 
of a risk factor. The public health relevance of this measure lies in estimating the proportion of 
cases of disease in the population that could be prevented if the exposure of interest were absent. 
AR is determined as follows: 

AR = P E (RR-1)/ (P E (RR-1)+1) 

AR is the risk attributable to a biallelic marker allele or a biallelic marker haplotype. P E is 
the frequency of exposure to an allele or a haplotype within the population at large; and RR is the 
relative risk which, is approximated with the odds ratio when the trait under study has a relatively 
low incidence in the general population. 

IDENTIFICATION OF BIALLELIC MARKERS IN LINKAGE DISEQUILIBRIUM 



Once a first biallelic marker has been identified in a genomic region of interest, the 
practitioner of ordinary skill in the art, using the teachings of the present invention, can easily 
identify additional biallelic markers in linkage disequilibrium with this first marker. As mentioned 
before, any marker in linkage disequilibrium with a first marker associated with a trait will be 
associated with the trait Therefore, once an association has been demonstrated between a given 
biallelic marker and a trait, the discovery of additional biallelic markers associated with this trait is 
of great interest in order to increase the density of biallelic markers in this particular region. The 



RR=P(R + )/P(R) 




OR= (F + /(l -F+MFVO -F*)) 



WITH THE BIALLELIC MARKERS OF THE INVENTION 
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causal gene or mutation will be found in the vicinity of the marker or set of markers showing the 
highest correlation with the trait 

Identification of additional markers in linkage disequilibrium with a given marker involves: 
(a) amplifying a genomic fragment comprising a first biallelic marker from a plurality of 
5 individuals; (b) identifying of second biallelic markers in the genomic region harboring said first 
biallelic marker; (c) conducting a linkage disequilibrium analysis between said first biallelic marker 
and second biallelic markers; and (d) selecting said second biallelic markers as being in linkage 
disequilibrium with said first marker. Subcombinations comprising steps (b) and (c) are also 
contemplated. 

10 Methods to identify biallelic markers and to conduct linkage disequilibrium analysis are 

described herein and can be carried out by the skilled person without undue experimentation. The 
present invention then also concerns biallelic markers which are in linkage disequilibrium with the 
biallelic markers Al to A80 and which are expected to present similar characteristics in terms of 
their respective association with a given trait. 

15 IDENTIFICATION OF FUNCTIONAL MUTATIONS 

Mutations in the PG-3 gene which are responsible for a detectable phenotype or trait may 
be identified by comparing the sequences of the PG-3 gene from trait positive and control 
individuals. Once a positive association is confirmed with a biallelic marker of the present 
invention, the identified locus can be scanned for mutations. In a preferred embodiment, functional 

20 regions such as exons and splice sites, promoters and other regulatory regions of the PG-3 gene are 
scanned for mutations. In a preferred embodiment the sequence of the PG-3 gene is compared in 
trait positive and control individuals. Preferably, trait positive individuals carry the haplotype 
shown to be associated with the trait and trait negative individuals do not carry the haplotype or 
allele associated with the trait. The detectable trait or phenotype may comprise a variety of 

25 manifestations of altered PG-3 function. 

The mutation detection procedure is essentially similar to that used for biallelic marker 
identification. The method used to detect such mutations generally comprises the following steps: 

- amplification of a region of the PG-3 gene comprising a biallelic marker or a group of 
biallelic markers associated with the trait from DNA samples of trait positive patients and trait- 

30 negative controls; 

- sequencing of the amplified region; 

- comparison of DNA sequences from trait positive and control individuals; 

- determination of mutations specific to trait-positive patients. 

In one embodiment, said biallelic marker is selected from the group consisting of A 1 to 
35 A80, and the complements thereof. It is preferred that candidate polymorphisms be then verified by 
screening a larger population of cases and controls by means of any genotyping procedure such as 
those described herein, preferably using a microsequencing technique in an individual test format. 
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Polymorphisms are considered as candidate mutations when present in cases and controls at 
frequencies compatible with the expected association results. Polymorphisms are considered as 
candidate "trait-causing" mutations when they exhibit a statistically significant correlation with the 
detectable phenotype. 
5 RECOMBINANT VECTORS 

The term 'Vector" is used herein to designate either a circular or a linear DNA or RNA 
molecule, which is either double-stranded or single-stranded, and which comprise at least one 
polynucleotide of interest that is sought to be transferred in a cell host or in a unicellular or 
multicellular host organism. 

10 The present invention encompasses a family of recombinant vectors that comprise a 

regulatory polynucleotide derived from the PG-3 genomic sequence, and/or a coding polynucleotide 
from either the PG-3 genomic sequence or the cDNA sequence. 

Generally, a recombinant vector of the invention may comprise any of the polynucleotides 
described herein, including regulatory sequences, coding sequences and polynucleotide constructs, 

15 as well as any PG-3 primer or probe as defined above. More particularly, the recombinant vectors 
of the present invention can comprise any of the polynucleotides described in the "Genomic 
Sequences Of The PG3 Gene" section, the "PG-3 cDNA Sequences" section, the "Coding Regions" 
section, the "Polynucleotide constructs" section, and the "Oligonucleotide Probes And Primers" 
section. 

20 In a first preferred embodiment, a recombinant vector of the invention is used to amplify 

the inserted polynucleotide derived from a PG-3 genomic sequence of SEQ ID No 1 or a PG-3 
cDNA, for example the cDNA of SEQ ID No 2 in a suitable cell host, this polynucleotide being 
amplified at every time that the recombinant vector replicates. 

A second preferred embodiment of the recombinant vectors according to the invention 

25 comprises expression vectors comprising either a regulatory polynucleotide or a coding nucleic acid 
of the invention, or both. Within certain embodiments, expression vectors are employed to express 
the PG-3 polypeptide, which can then be purified and, for example be used in ligand screening 
assays or as an immunogen in order to raise specific antibodies directed against the PG-3 protein. 
In other embodiments, the expression vectors are used for constructing transgenic animals and also 

30 for gene therapy. Expression requires that appropriate signals are provided in the vectors, said 
signals including various regulatory elements, such as enhancers/promoters from both viral and 
mammalian sources that drive expression of the genes of interest in host cells. Dominant drug 
selection markers for establishing permanent, stable cell clones expressing the products are 
generally included in the expression vectors of the invention, as they are elements that link 

35 expression of the drug selection markers to expression of the polypeptide. 
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The present invention further relates to a method of making a polypeptide of the present 
invention by altering the expression of a targeted endogenous gene in a cell in vitro or in vivo 
wherein the gene is not normally expressed in the cell, comprising the steps of: a) transfecting the 
cell in vitro with a a polynucleotide construct, the a polynucleotide construct comprising: (i) a 
5 targeting sequence; (ii) a regulatory sequence and/or a coding sequence; and (iii) an unpaired splice 
donor site, if necessary, thereby producing a transfected cell; (b) maintaining the transfected cell in 
vitro or in vivo under conditions appropriate for homologous recombination, thereby producing a 
homologously recombinant cell; and c) maintaining the homologously recombinant cell in vitro or 
in vivo under conditions appropriate for expression of the gene thereby making the polypeptide. 

10 The present invention further relates to a polynucleotide construct which alters the 

expression of a targeted gene in a cell type in which the gene is not normally expressed. This 
occurs when the a polynucleotide construct is inserted into the chromosomal DNA of the target cell, 
wherein the a polynucleotide construct comprises: a) a targeting sequence; b) a regulatory sequence 
and/or coding sequence; and c) an unpaired splice-donor site, if necessary. Further included are a 

15 polynucleotide constructs, as described above, wherein the construct further comprises a 

polynucleotide which encodes a polypeptide and is in-frame with the targeted endogenous gene 
after homologous recombination with chromosomal DNA. 

The compositions may be produced, and methods performed, by techniques known in the 
art, such as those described in U.S. Patent Nos: 6,054,288; 6,048,729; 6,048,724; 6,048,524; 

20 5,994,127; 5,968,502; 5,965,125; 5,869,239; 5,817,789; 5,783,385; 5,733,761; 5,641,670; 
5,580,734 ; International Publication Nos:W096/2941 1, WO 94/12650; and scientific articles 
including Koller et a/.,1989. 

1. General features of the expression vectors of the invention 

A recombinant vector according to the invention comprises, but is not limited to, a YAC 

25 (Yeast Artificial Chromosome), a BAC (Bacterial Artificial Chromosome), a phage, a phagemid, a 
cosmid, a plasmid or even a linear DNA molecule which may consist of a chromosomal, non- 
chromosomal, semi-synthetic and synthetic DNA. Such a recombinant vector can comprise a 
transcriptional unit comprising an assembly of: 

(1) a genetic element or elements having a regulatory role in gene expression, for example 
30 promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from about 10 to 300 

bp in length that act on the promoter to increase the transcription. 

(2) a structural or coding sequence which is transcribed into mRNA and eventually 
translated into a polypeptide, said structural or coding sequence being operably linked to the 
regulatory elements described in (1); and 

35 (3) appropriate transcription initiation and termination sequences. Structural units intended 

for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling 
extracellular secretion of translated protein by a host cell. Alternatively, when a recombinant 
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protein is expressed without a leader or transport sequence, it may include a N-terminal residue. 
This residue may or may not be subsequently cleaved from the expressed recombinant protein to 
provide a final product. 

Generally, recombinant expression vectors will include origins of replication, selectable 

5 markers permitting transformation of the host cell, and a promoter derived from a highly expressed 
gene to direct transcription of a downstream structural sequence. The heterologous structural 
sequence is assembled in appropriate phase with translation initiation and termination sequences, 
and preferably a leader sequence capable of directing secretion of the translated protein into the 
periplasmic space or the extracellular medium. In a specific embodiment wherein the vector is 

10 adapted for transfecting and expressing desired sequences in mammalian host cells, preferred 
vectors will comprise an origin of replication in the desired host, a suitable promoter and enhancer, 
and also any necessary ribosome binding sites, polyadenylation signal, splice donor and acceptor 
sites, transcriptional termination sequences, and 5 -flanking non-transcribed sequences. DNA 
sequences derived from the SV40 viral genome, for example SV40 origin, early promoter, 

15 enhancer, splice and polyadenylation signals may be used to provide the required non-transcribed 
genetic elements. 

The in vivo expression of a PG-3 polypeptide of SEQ ID No 3 or fragments or variants 
thereof may be useful in order to correct a genetic defect related to the expression of the native gene 
in a host organism or to the production of a biologically inactive PG-3 protein. 
20 Consequently, the present invention also deals with recombinant expression vectors mainly 

designed for the in vivo production of the PG-3 polypeptide of SEQ ID No 3 or fragments or 
variants thereof by the introduction of the appropriate genetic material in the organism of the patient 
to be treated. This genetic material may be introduced in vitro in a cell that has been previously 
extracted from the organism, the modified cell being subsequently reintroduced in the said 
25 organism, directly in vivo into the appropriate tissue. 
2. Regulatory Elements 
PROMOTERS 

The suitable promoter regions used in the expression vectors according to the present 
invention are chosen taking into account the cell host in which the heterologous gene has to be 

30 expressed. The particular promoter employed to control the expression of a nucleic acid sequence 
of interest is not believed to be important, so long as it is capable of directing the expression of the 
nucleic acid in the targeted cell. Thus, where a human cell is targeted, it is preferable to position the 
nucleic acid coding region adjacent to and under the control of a promoter that is capable of being 
expressed in a human cell, such as, for example, a human or a viral promoter. 

35 A suitable promoter may be heterologous with respect to the nucleic acid for which it 

controls the expression or alternatively can be endogenous to the native polynucleotide containing 
the coding sequence to be expressed. Additionally, the promoter is generally heterologous with 
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respect to the recombinant vector sequences within which the construct promoter/coding sequence 
has been inserted. 

Promoter regions can be selected from any desired gene using, for example, CAT 
(chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors. 
5 Preferred bacterial promoters are the Lad, LacZ, the T3 or T7 bacteriophage RNA 

polymerase promoters, the gpt, lambda PR, PL and trp promoters (EP 0036776), the polyhedrin 
promoter, or the plO protein promoter from baculovirus (Kit Novagen) (Smith et al. 9 1983; O'Reilly 
et al. 9 1992), the lambda PR promoter or also the trc promoter. 

Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late 
10 SV40, LTRs from retrovirus, and mouse metallothionein-L. Selection of a convenient vector and 
promoter is well within the level of ordinary skill in the art. 

The choice of a promoter is well within the ability of a person skilled in the field of genetic 
egineering. For example, one may refer to the book of Sambrook et al (1989) or also to the 
procedures described by Fuller et al (1996). 
15 OTHER REGULATORY ELEMENTS 

Where a cDNA insert is employed, one will typically desire to include a polyadenylation 
signal to effect proper polyadenylation of the gene transcript. The nature of the polyadenylation 
signal is not believed to be crucial to the successful practice of the invention, and any such sequence 
may be employed such as human growth hormone and SV40 polyadenylation signals. Also 
20 contemplated as an element of the expression cassette is a terminator. These elements can serve to 
enhance message levels and to minimize read through from the cassette into other sequences. 

3. Selectable Markers 

Such markers would confer an identifiable change to the cell permitting easy identification 
of cells containing the expression construct. The selectable marker genes for selection of 
25 transformed host cells are preferably dihydrofolate reductase or neomycin resistance for eukaryotic 
cell culture, TRP1 for S. cerevisiae or tetracycline, rifampicin or ampicillin resistance in E. coli y or 
levan saccharase for mycobacteria, this latter marker being a negative selection marker. 

4. Preferred Vectors, 
BACTERIAL VECTORS 

30 As a representative but non-limiting example, useful expression vectors for bacterial use 

can comprise a selectable marker and a bacterial origin of replication derived from commercially 
available plasmids comprising genetic elements of pBR322 (ATCC 37017). Such commercial 
vectors include, for example, pKK223-3 (Pharmacia, Uppsala, Sweden), and GEM1 (Promega 
Biotec, Madison, WI, USA). 

35 Large numbers of other suitable vectors are known to those of skill in the art, and 

commercially available, such as the following bacterial vectors: pQE70, pQE60, pQE-9 (Qiagen), 
pbs, pDIO, phagescript, psiX174, pbluescript SK, pbsks, pNH8A, pNH16A, pNH18A, pNH46A 



WO 01/14550 PCT/IB00/01098 

85 

(Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); pWLNEO, pSV2CAT, 
pOG44, pXTl, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia); pQE-30 
(QIAexpress). 

BACTERIOPHAGE VECTORS 
5 The PI bacteriophage vector may contain large inserts ranging from about 80 to about 100 

kb. 

The construction of PI bacteriophage vectors such as pl58 or pl58/neo8 are notably 
described by Sternberg (1992, 1994). Recombinant PI clones comprising PG-3 nucleotide 
sequences may be designed for inserting large polynucleotides of more than 40 kb (Linton et ai, 

10 1 993). To generate PI DNA for transgenic experiments, a preferred protocol is the protocol 

described by McCormick et a/.(1994). Briefly, £. coli (preferably strain NS3529) harboring the PI 
plasmid are grown overnight in a suitable broth medium containing 25 ng/ml of kanamycin. The 
PI DNA is prepared from the E. coli by alkaline lysis using the Qiagen Plasmid Maxi kit (Qiagen, 
Chatsworth, CA, USA), according to the manufacturer's instructions. The PI DNA is purified from 

1 5 the bacterial lysate on two Qiagen-tip 500 columns, using the washing and elution buffers contained 
in the kit. A phenol/chloroform extraction is then performed before precipitating the DNA with 
70% ethanol. After solubilizing the DNA in TE (10 mM Tris-HCl, pH 7.4, 1 mM EDTA), the 
concentration of the DNA is assessed by spectrophotometry. 

When the goal is to express a PI clone comprising PG-3 nucleotide sequences in a 

20 transgenic animal, typically in transgenic mice, it is desirable to remove vector sequences from the 
PI DNA fragment, for example by cleaving the PI DNA at rare-cutting sites within the PI 
polylinker (Sfil, Notl or Salt). The PI insert is then purified from vector sequences on a pulsed- 
field agarose gel, using methods similar using methods similar to those originally reported for the 
isolation of DNA from YACs (Schedl et aL, 1993a; Peterson et al., 1993). At this stage, the 

25 resulting purified insert DNA can be concentrated, if necessary, on a Millipore Ultrafree-MC Filter 
Unit (Millipore, Bedford, MA, USA - 30,000 molecular weight limit) and then dialyzed against 
microinjection buffer (10 mM Tris-HCl, pH 7.4; 250 \xM EDTA) containing 100 mM NaCl, 30 *iM 
spermine, 70 \iM spermidine on a microdyalisis membrane (type VS, 0.025 \xM from Millipore). 
The intactness of the purified PI DNA insert is assessed by electrophoresis on 1% agarose (Sea 

30 Kem GTG; FMC Bio-products) pulse-field gel and staining with ethidium bromide. 
BACULOVIRUS VECTORS 

A suitable vector for the expression of the PG-3 polypeptide of SEQ ID No 3 or fragments 
or variants thereof is a baculovirus vector that can be propagated in insect cells and in insect cell 
lines. A specific suitable host vector system is the pVL1392/1393 baculovirus transfer vector 
35 (Pharmingen) that is used to transfect the SF9 c II line (ATCC N°CRL 171 1) which is derived from 
Spodoptera frugiperda. 
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Other suitable vectors for the expression of th PG-3 polypeptide of SEQ ID No 3 or 
fragments or variants thereof in a baculovirus expression system include those described by Chai et 
a/.(1993), Vlasak et a/.(1983) and Lenhard et a/.(1996). 

VIRAL VECTORS 

5 In one specific embodiment, the vector is derived from an adenovirus. Preferred adenovirus 

vectors according to the invention are those described by Feldman and Steg (1996) or Ohno et 
fl/.(1994). Another preferred recombinant adenovirus according to this specific embodiment of the 
present invention is the human adenovirus type 2 or 5 (Ad 2 or Ad 5) or an adenovirus of animal 
origin ( French patent application N° FR-93 .05954). 

10 Retrovirus vectors and adeno-associated virus vectors are generally understood to be the 

recombinant gene delivery systems of choice for the transfer of exogenous polynucleotides in vivo , 
particularly to mammals, including humans. These vectors provide efficient delivery of genes into 
cells, and the transferred nucleic acids are stably integrated into the chromosomal DNA of the host. 
Particularly preferred retroviruses for the preparation or construction of retroviral in vitro or 

15 in vitro gene delivery vehicles of the present invention include retroviruses selected from the group 
consisting of Mink-Cell Focus Inducing Virus, Murine Sarcoma Virus, Reticuloendotheliosis virus 
and Rous Sarcoma virus. Particularly preferred Murine Leukemia Viruses include the 4070A and 
the 1504A viruses, Abelson (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCC No 
VR-590), Rauscher (ATCC No VR-998) and Moloney Murine Leukemia Virus (ATCC No VR- 

20 190; PCT Application No WO 94/24298). Particularly preferred Rous Sarcoma Viruses include 
Bryan high titer (ATCC Nos VR-334, VR-657, VR-726, VR-659 and VR-728). Other preferred 
retroviral vectors are those described in Roth et a/.(1996), PCT Application No WO 93/25234, PCT 
Application No WO 94/ 06920, Roux et aL, 1989, Julan et aL, 1992 andNeda et aL, 1991. 

Yet another viral vector system that is contemplated by the invention consists in the adeno- 

25 associated virus (AAV). The adeno-associated virus is a naturally occurring defective virus that 
requires another virus, such as an adenovirus or a herpes virus, as a helper virus for efficient 
replication and a productive life cycle (Muzyczka et aL, 1992). It is also one of the few viruses that 
may integrate its DNA into non-dividing cells, and exhibits a high frequency of stable integration 
(Flotte et aL, 1992; Samulski et aL, 1989; McLaughlin et aL, 1989). One advantageous feature of 

30 AAV derives from its reduced efficacy for transducing primary cells relative to transformed cells. 
BAC VECTORS 

The bacterial artificial chromosome (BAC) cloning system (Shizuya et aL, 1992) has been 
developed to stably maintain large fragments of genomic DNA (100-300 kb) in E. colL A 
preferred BAC vector consists of pBeloBACl 1 vector that has been described by Kim et a/.(1996). 
35 BAC libraries are prepared with this vector using size-selected genomic DNA that has been 

partially digested using enzymes that permit ligation into either the Bam HI or HindUl sites in the 
vector. Flanking these cloning sites are T7 and SP6 RNA polymerase transcription initiation sites 
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that can be used to generate end probes by either RNA transcription or PCR methods. After the 
construction of a BAC library in E. coii, BAC DNA is purified from the host cell as a supercoiled 
circle. Converting these circular molecules into a linear form precedes both size determination and 
introduction of the BACs into recipient cells. The cloning site is flanked by two Not I sites, 
5 permitting cloned segments to be excised from the vector by Not I digestion. Alternatively, the 
DNA insert contained in the pBeloBACl 1 vector may be linearized by treatment of the BAC vector 
with the commercially available enzyme lambda terminase that leads to the cleavage at the unique 
cosN site, but this cleavage method results in a full length BAC clone containing both the insert 
DNA and the BAC sequences. 

10 5. Delivery Of The Recombinant Vectors 

In order to effect expression of the polynucleotides and polynucleotide constructs of the 
invention, these constructs must be delivered into a cell. This delivery may be accomplished in 
vitro, as in laboratory procedures for transforming cell lines, or in vivo or ex vivo, as in the treatment 
of certain diseases states. 

15 One mechanism is viral infection where the expression construct is encapsulated in an 

infectious viral particle. 

Several non-viral methods for the transfer of polynucleotides into cultured mammalian cells 
are also contemplated by the present invention, and include, without being limited to, calcium 
phosphate precipitation (Graham et al, 1973; Chen et aL, 1987;), DEAE-dextran (Gopal, 1985), 

20 electroporation (Tur-Kaspa et ai, 1986; Potter et al, 1984), direct microinjection (Harland et a/., 
1985), DNA-loaded liposomes (Nicolau et al. 9 1982; Fraley et al, 1979), and receptor-mediated 
transfection (Wu and Wu, 1987; 1988). Some of these techniques may be successfully adapted for 
in vivo or ex vivo use. 

Once the expression polynucleotide has been delivered into the cell, it may be stably 

25 integrated into the genome of the recipient cell. This integration may be in the cognate location and 
orientation via homologous recombination (gene replacement) or it may be integrated in a random, 
non specific location (gene augmentation). In yet further embodiments, the nucleic acid may be 
stably maintained in the cell as a separate, episomal segment of DNA. Such nucleic acid segments 
or "episomes" encode sequences sufficient to permit maintenance and replication independent of or 

30 in synchronization with the host cell cycle. 

One specific embodiment for a method for delivering a protein or peptide to the interior of a 
cell of a vertebrate in vivo comprises the step of introducing a preparation comprising a 
physiologically acceptable carrier and a naked polynucleotide operatively coding for the 
polypeptide of interest into the interstitial space of a tissue comprising the cell, whereby the naked 

35 polynucleotide is taken up into the interior of the cell and has a physiological effect. This is 
particularly applicable for transfer in vitro but it may be applied to in vivo as well. 
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Compositions for use in vitro and in vivo comprising a "naked" polynucleotide are 
described in PCT application N° WO 90/1 1092 (Vical Inc.), and also in PCT application No. WO 
95/1 1307 (Institut Pasteur, INSERM, Universite d'Ottawa), as well as in the articles of Tacson et 
a/.(1996) and of Huygen et a/.(1996). 
5 In still another embodiment of the invention, the transfer of a naked polynucleotide of the 

invention, including a polynucleotide construct of the invention, into cells may be proceeded with a 
particle bombardment (biolistic), said particles being DNA-coated microprojectiles accelerated to a 
high velocity allowing them to pierce cell membranes and enter cells without killing them, such as 
described by Klein et al (1987). 

10 In a further embodiment, the polynucleotide of the invention may be entrapped in a 

liposome (Ghosh and Bacchawat, 1 991 ; Wong et al , 1 980; Nicolau et al , 1 987) 

In a specific embodiment, the invention provides a composition for the in vivo production 
of the PG-3 protein or polypeptide described herein. It comprises a naked polynucleotide 
operatively coding for this polypeptide, in solution in a physiologically acceptable carrier, and 

15 suitable for introduction into a tissue to cause cells of the tissue to express the said protein or 
polypeptide. 

The amount of vector to be injected to the desired host organism varies according to the site 
of injection. As an indicative dose, it will be injected between 0,1 and 100 fig of the vector in an 
animal body, preferably a mammal body, for example a mouse body. 

20 In another embodiment of the vector according to the invention, it may be introduced in 

vitro in a host cell, preferably in a host cell previously harvested from the animal to be treated and 
more preferably a somatic cell such as a muscle cell. In a subsequent step, the cell that has been 
transformed with the vector coding for the desired PG-3 polypeptide or the desired fragment thereof 
is reintroduced into the animal body in order to deliver the recombinant protein within the body 

25 either locally or systemically. 

CELL HOSTS 

Another object of the invention consists of a host cell that has been transformed or 
transfected with one of the polynucleotides described herein, and in particular a polynucleotide 
either comprising a PG-3 regulatory polynucleotide or the coding sequence for the PG-3 

30 polypeptide in a polynucleotide selected from the group consisting of SEQ ID Nos 1 and 2 or a 
fragment or a variant thereof. Also included are host cells that are transformed (prokaryotic cells) 
or that are transfected (eukaryotic cells) with a recombinant vector such as one of those described 
above. More particularly, the cell hosts of the present invention can comprise any of the 
polynucleotides described in the "Genomic Sequences Of The PG3 Gene" section, the "PG-3 cDNA 

35 Sequences" section, the "Coding Regions" section, the "Polynucleotide constructs" section, and the 
"Oligonucleotide Probes And Primers" section. 
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A further recombinant cell host according to the invention comprises a polynucleotide 
containing a biallelic marker selected from the group consisting of A 1 to A80, and the complements 
thereof. 

An additional recombinant cell host according to the invention comprises any of the vectors 
5 described herein, more particularly any of the vectors described in the " Recombinant Vectors" 
section. 

Preferred host cells used as recipients for the expression vectors of the invention are the 
following: 

a) Prokaryotic host cells: Escherichia coli strains (/.£.DH5-a strain), Bacillus 

10 subtilis, Salmonella typhimurium, and strains from species like Pseudomonas, Streptomyces 

and Staphylococcus. 

b) Eukaryotic host cells: HeLa cells (ATCC N°CCL2; N°CCL2.1; N°CCL2.2), Cv 
1 cells (ATCC N°CCL70), COS cells (ATCC N°CRL1 650; N°CRL1 65 1 ), Sf-9 cells 
(ATCC N°CRL171 1), C127 cells (ATCC N° CRL-1804), 3T3 (ATCC N° CRL-6361), 

15 CHO (ATCC N° CCL-61), human kidney 293. (ATCC N° 45504; N° CRL-1573) and 

BHK (ECACC N° 84100501 ; N° 841 1 1301). 

c) Other mammalian host cells. 

The PG-3 gene expression in mammalian, and typically human, cells may be rendered 
defective, or alternatively expression may be provided by the insertion of a PG-3 genomic or cDNA 
20 sequence with the replacement of the PG-3 gene counterpart in the genome of an animal cell by a 
PG-3 polynucleotide according to the invention. These genetic alterations may be generated by 
homologous recombination events using specific DNA constructs that have been previously 
described. 

One kind of cell hosts that may be used are mammalian zygotes, such as murine zygotes. 

25 For example, murine zygotes may undergo microinjection with a purified DNA molecule of 

interest, for example a purified DNA molecule that has previously been adjusted to a concentration 
range from 1 ng/ml -for BAC inserts- 3 ng/jil -for PI bacteriophage inserts- in 10 mM Tris-HCl, 
pH 7.4, 250 \xM EDTA containing 100 mM NaCI, 30 \xM spermine, and70 \xM spermidine. When 
the DNA to be microinjected has a large size, polyamines and high salt concentrations can be used 

30 in order to avoid mechanical breakage of this DNA, as described by Schedl et al (1993b). 

Anyone of the polynucleotides of the invention, including the DNA constructs described 
herein, may be introduced in an embryonic stem (ES) cell line, preferably a mouse ES cell line. ES 
cell lines are derived from pluripotent, uncommitted cells of the inner cell mass of pre-implantation 
blastocysts. Preferred ES cell lines are the following: ES-E14TG2a (ATCC n° CRL-1 821 ), ES-D3 

35 (ATCC n° CRL1934 and n° CRL-1 1632), YS001 (ATCC n° CRL-1 1776), 36.5 (ATCC n° CRL- 
11116). To maintain ES cells in an uncommitted state, they are cultured in the presence of growth 
inhibited feeder cells which provide the appropriate signals to preserve this embryonic phenotype 
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and serve as a matrix for ES cell adherence. Preferred feeder cells consist of primary embryonic 
fibroblasts that are established from tissue of day 13- day 14 embryos of virtually any mouse strain, 
that are maintained in culture, such as described by Abbondanzo et aL (1993) and are inhibited in 
growth by irradiation, such as described by Robertson (1987), or by the presence of an inhibitory 
5 concentration of LIF, such as described by Pease and Williams (1990). 

The constructs in the host cells can be used in a conventional manner to produce the gene 
product encoded by the recombinant sequence. 

Following transformation of a suitable host and growth of the host to an appropriate cell 
density, the selected promoter is induced by appropriate means, such as temperature shift or 
10 chemical induction, and cells are cultivated for an additional period. 

Cells are typically harvested by centrifugation, disrupted by physical or chemical means, 
and the resulting crude extract retained for further purification. 

Microbial cells employed in the expression of proteins can be disrupted by any convenient 
method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing 
15 agents. Such methods are well known by the skill artisan. 

TRANSGENIC ANIMALS 
The terms "transgenic animals" or "host animals'* are used herein designate animals that 
have their genome genetically and artificially manipulated so as to include one of the nucleic acids 
according to the invention. Preferred animals are non-human mammals and include those 
20 belonging to a genus selected from Mus (e.g. mice), Rattus (e.g. rats) and Oryctogalus (e.g. rabbits) 
which have their genome artificially and genetically altered by the insertion of a nucleic acid 
according to the invention. In one embodiment, the invention encompasses non-human host 
mammals and animals comprising a recombinant vector of the invention or a PG-3 gene disrupted 
by homologous recombination with a knock out vector. 
25 The transgenic animals of the invention all include within a plurality of their cells a cloned 

recombinant or synthetic DNA sequence, more specifically one of the purified or isolated nucleic 
acids comprising a PG-3 coding sequence, a PG-3 regulatory polynucleotide, a polynucleotide 
construct, or a DNA sequence encoding an antisense polynucleotide such as described in the present 
specification. 

30 Generally, a transgenic animal according the present invention comprises any one of the 

polynucleotides, the recombinant vectors and the cell hosts described in the present invention. 
More particularly, the transgenic animals of the present invention can comprise any of the 
polynucleotides described in the "Genomic Sequences Of The PG3 Gene" section, the "PG-3 cDNA 
Sequences" section, the "Coding Regions" section, the "Polynucleotide constructs" section, the 

35 "Oligonucleotide Probes And Primers" section, the "Recombinant Vectors" section and the "Cell 
Hosts" section. 
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A further transgenic animals according to the invention contains in their somatic cells 
and/or in their germ line cells a polynucleotide comprising a biallelic marker selected from the 
group consisting of A 1 to A80, and the complements thereof. 

In a first preferred embodiment, these transgenic animals may be good experimental models 
5 in order to study the diverse pathologies related to cell differentiation, in particular concerning the 
transgenic animals within the genome of which has been inserted one or several copies of a 
polynucleotide encoding a native PG-3 protein, or alternatively a mutant PG-3 protein. 

In a second preferred embodiment, these transgenic animals may express a desired 
polypeptide of interest under the control of the regulatory polynucleotides of the PG-3 gene, leading 
10 to good yields in the synthesis of this protein of interest, and eventually a tissue specific expression 
of this protein of interest. 

The design of the transgenic animals of the invention may be made according to the 
conventional techniques well known from the one skilled in the art. For more details regarding the 
production of transgenic animals, and specifically transgenic mice, it may be referred to US Patents 
15 Nos 4,873,191, issued Oct. 10, 1989; 5,464,764 issued Nov 7, 1995; and 5,789,215, issued Aug 4, 
1998; these documents disclosing methods producing transgenic mice. 

Transgenic animals of the present invention are produced by the application of procedures 
which result in an animal with a genome that has incorporated exogenous genetic material. The 
procedure involves obtaining the genetic material, or a portion thereof, which encodes either a PG-3 
20 coding sequence, a PG-3 regulatory polynucleotide or a DNA sequence encoding a PG-3 antisense 
polynucleotide such as described in the present specification. 

A recombinant polynucleotide of the invention is inserted into an embryonic or ES stem 
cell line. The insertion is preferably made using electroporation, such as described by Thomas et 
a/.(1987). The cells subjected to electroporation are screened (e.g. by selection via selectable 
25 markers, by PCR or by Southern blot analysis) to find positive cells which have integrated the 
exogenous recombinant polynucleotide into their genome, preferably via an homologous 
recombination event. An illustrative positive-negative selection procedure that may be used 
according to the invention is described by Mansour et a/.(1988). 

Then, the positive cells are isolated, cloned and injected into 3.5 days old blastocysts from 
30 mice, such as described by Bradley (1987). The blastocysts are then inserted into a female host 
animal and allowed to grow to term. 

Alternatively, the positive ES cells are brought into contact with embryos at the 2.5 days 
old 8-16 cell stage (morulae) such as described by Wood et at. (1993) or by Nagy et a/.(l 993), the 
ES cells being internalized to colonize extensively the blastocyst including the cells which will give 
35 rise to the germ line. 

The offspring of the female host are tested to determine which animals are transgenic e.g. 
include the inserted exogenous DNA sequence and which are wild-type. 



WO 01/14550 PCT/IB00/01098 

92 

Thus, the present invention also concerns a transgenic animal containing a nucleic acid, a 
recombinant expression vector or a recombinant host cell according to the invention. 

Recombinant CeU Lines Derived From The Transgenic Animals Of The Invention. 

A further object of the invention consists of recombinant host cells obtained from a 
5 transgenic animal described herein. In one embodiment the invention encompasses cells derived 
from non-human host mammals and animals comprising a recombinant vector of the invention or a 
PG-3 gene disrupted by homologous recombination with a knock out vector. 

Recombinant cell lines may be established in vitro from cells obtained from any tissue of a 
transgenic animal according to the invention, for example by transfection of primary cell cultures 
1 0 with vectors expressing one-genes such as S V40 large T antigen, as described by Chou (1 989) and 
Shaye/a/.(1991). 

METHODS FOR SCREENING SUBSTANCES INTERACTING WITH A PG-3 

POLYPEPTIDE 

For the purpose of the present invention, a ligand means a molecule, such as a protein, a 

15 peptide, an antibody or any synthetic chemical compound capable of binding to the PG-3 protein or 
one of its fragments or variants or to modulate the expression of the polynucleotide coding for PG-3 
or a fragment or variant thereof. These molecules may be used in therapeutic compositions, 
preferably therapeutic compositions acting against cancer. 

In the ligand screening method according to the present invention, a biological sample or a 

20 defined molecule to be tested as a putative ligand of the PG-3 protein is brought into contact with 
the corresponding purified PG-3 protein, for example the corresponding purified recombinant PG-3 
protein produced by a recombinant cell host as described hereinbefore, in order to form a complex 
between this protein and the putative ligand molecule to be tested. 

As an illustrative example, to study the interaction of the PG-3 protein, or a fragment 

25 comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 3, with drugs or 
small molecules, such as molecules generated through combinatorial chemistry approaches, the 
microdialysis coupled to HPLC method described by Wang et al (1997) or the affinity capillary 
electrophoresis method described by Bush et al (1997). 

30 In further methods, peptides, drugs, fatty acids, lipoproteins, or small molecules which 

interact with the PG-3 protein, or a fragment comprising a contiguous span of at least 6 amino acids, 
preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 
amino acids of SEQ ID No 3 may be identified using assays such as the following. The molecule to 
be tested for binding is labeled with a detectable label, such as a fluorescent , radioactive, or 

35 enzymatic tag and placed in contact with immobilized PG-3 protein, or a fragment thereof under 
conditions which permit specific binding to occur. After removal of non-specifically bound 
molecules, bound molecules are detected using appropriate means. 
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Another object of the present invention consists of methods and kits for the screening of 
candidate substances that interact with PG-3 polypeptide. 

The present invention pertains to methods for screening substances of interest that interact 
with a PG-3 protein or one fragment or variant thereof. By their capacity to bind covalently or non- 
5 covalently to a PG-3 protein or to a fragment or variant thereof, these substances or molecules may 
be advantageously used both in vitro and in vivo. 

In vitro, said interacting molecules may be used as detection means in order to identify the 
presence of a PG-3 protein in a sample, preferably a biological sample. 

A method for the screening of a candidate substance comprises the following steps : 
10 a) providing a polypeptide consisting of a PG-3 protein or a fragment comprising a 

contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 

preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 3; 

b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; 

15 d) detecting the complexes formed between said polypeptide and said candidate 

substance. 

The invention further concerns a kit for the screening of a candidate substance interacting 
with the PG-3 polypeptide, wherein said kit comprises: 

a) a PG-3 protein having an amino acid sequence selected from the group 

20 consisting of the amino acid sequences of SEQ ID No 3 or a peptide fragment comprising a 

contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 3; 

b) optionally means useful to detect the complex formed between the PG-3 protein 
or a peptide fragment or a variant thereof and the candidate substance. 

25 In a preferred embodiment of the kit described above, the detection means consist in 

monoclonal or polyclonal antibodies directed against the PG-3 protein or a peptide fragment or a 
variant thereof. 

Various candidate substances or molecules can be assayed for interaction with a PG-3 
polypeptide. These substances or molecules include, without being limited to, natural or synthetic 

30 organic compounds or molecules of biological origin such as polypeptides. When the candidate 
substance or molecule consists of a polypeptide, this polypeptide may be the resulting expression 
product of a phage clone belonging to a phage-based random peptide library, or alternatively the 
polypeptide may be the resulting expression product of a cDNA library cloned in a vector suitable 
for performing a two-hybrid screening assay. 

35 The invention also pertains to kits useful for performing the hereinbefore described 

screening method. Preferably, such kits comprise a PG-3 polypeptide or a fragment or a variant 
thereof, and optionally means useful to detect the complex formed between the PG-3 polypeptide or 
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its fragment or variant and the candidate substance. In a preferred embodiment the detection means 
consist in monoclonal or polyclonal antibodies directed against the corresponding PG-3 polypeptide 
or a fragment or a variant thereof. 

A. Candidate ligands obtained from random peptide libraries 

5 In a particular embodiment of the screening method, the putative ligand is the expression 

product of a DNA insert contained in a phage vector (Parmley and Smith, 1988). Specifically, 
random peptide phages libraries are used. The random DNA inserts encode for peptides of 8 to 20 
amino acids in length (Oldenburg K.R. et al> 1992; Valadon P., et al y 1996; Lucas A.H., 1994; 
Westerink M.A.J., 1995; Felici F. et al. f 1991). According to this particular embodiment, the 

10 recombinant phages expressing a protein that binds to the immobilized PG-3 protein is retained and 
the complex formed between the PG-3 protein and the recombinant phage may be subsequently 
immunoprecipitated by a polyclonal or a monoclonal antibody directed against the PG-3 protein. 

Once the ligand library in recombinant phages has been constructed, the phage population 
is brought into contact with the immobilized PG-3 protein. Then the preparation of complexes is 

1 5 washed in order to remove the non-specifically bound recombinant phages. The phages that bind 
specifically to the PG-3 protein are then eluted by a buffer (acid pH) or immunoprecipitated by the 
monoclonal antibody produced by the hybridoma anti-PG-3, and this phage population is 
subsequently amplified by an over-infection of bacteria (for example E. coli). The selection step 
may be repeated several times, preferably 2-4 times, in order to select the more specific 

20 recombinant phage clones. The last step consists in characterizing the peptide produced by the 
selected recombinant phage clones either by expression in infected bacteria and isolation, 
expressing the phage insert in another host-vector system, or sequencing the insert contained in the 
selected recombinant phages. 

B. Candidate ligands obtained by competition experiments. 

25 Alternatively, peptides, drugs or small molecules which bind to the PG-3 protein, or a 

fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino 
acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 3, may 
be identified in competition experiments. In such assays, the PG-3 protein, or a fragment thereof, is 
immobilized to a surface, such as a plastic plate. Increasing amounts of the peptides, drugs or small 

30 molecules are placed in contact with the immobilized PG-3 protein, or a fragment thereof, in the 
presence of a detectable labeled known PG-3 protein ligand. For example, the PG-3 ligand may be 
detectably labeled with a fluorescent, radioactive, or enzymatic tag. The ability of the test molecule 
to bind the PG-3 protein, or a fragment thereof, is determined by measuring the amount of 
detectably labeled known ligand bound in the presence of the test molecule. A decrease in the 

35 amount of known ligand bound to the PG-3 protein, or a fragment thereof, when the test molecule is 
present indicated that the test molecule is able to bind to the PG-3 protein, or a fragment thereof. 

C. Candidate ligands obtained by affinity chromatography. 
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Proteins or other molecules interacting with the PG-3 protein, or a fragment comprising a 
contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 3, can also be found using affinity 
columns which contain the PG-3 protein, or a fragment thereof. The PG-3 protein, or a fragment 
5 thereof, may be attached to the column using conventional techniques including chemical coupling 
to a suitable column matrix such as agarose, Affi Gel® , or other matrices familiar to those of skill 
in art. In some embodiments of this method, the affinity column contains chimeric proteins in 
which the PG-3 protein, or a fragment thereof, is fused to glutathion S transferase (GST). A 
mixture of cellular proteins or pool of expressed proteins as described above is applied to the 

10 affinity column. Proteins or other molecules interacting with the PG-3 protein, or a fragment 
thereof, attached to the column can then be isolated and analyzed on 2-D electrophoresis gel as 
described in Ramunsen et al. (1997). Alternatively, the proteins retained on the affinity column can 
be purified by electrophoresis based methods and sequenced. The same method can be used to 
isolate antibodies, to screen phage display products, or to screen phage display human antibodies. 

1 5 D. Candidate ligands obtained by optical biosensor methods 

Proteins interacting with the PG-3 protein, or a fragment comprising a contiguous span of at 
least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 
30, 40, 50, or 100 amino acids of SEQ ID No 3, can also be screened by using an Optical Biosensor 
as described in Edwards and Leatherbarrow (1997) and also in Szabo et al. (1995). This technique 

20 permits the detection of interactions between molecules in real time, without the need of labeled 
molecules. This technique is based on the surface plasmon resonance (SPR) phenomenon. Briefly, 
the candidate ligand molecule to be tested is attached to a surface (such as a carboxymethyl dextran 
matrix). A light beam is directed towards the side of the surface that does not contain the sample to 
be tested and is reflected by said surface. The SPR phenomenon causes a decrease in the intensity 

25 of the reflected light with a specific association of angle and wavelength. The binding of candidate 
ligand molecules cause a change in the refraction index on the surface, which change is detected as 
a change in the SPR signal. For screening of candidate ligand molecules or substances that are able 
to interact with the PG-3 protein, or a fragment thereof, the PG-3 protein, or a fragment thereof, is 
immobilized onto a surface. This surface consists of one side of a cell through which flows the 

30 candidate molecule to be assayed. The binding of the candidate molecule on the PG-3 protein, or a 
fragment thereof, is detected as a change of the SPR signal. The candidate molecules tested may 
be proteins, peptides, carbohydrates, lipids, or small molecules generated by combinatorial 
chemistry. This technique may also be performed by immobilizing eukaryotic or prokaryotic cells 
or lipid vesicles exhibiting an endogenous or a recombinantly expressed PG-3 protein at their 

35 surface. 

The main advantage of the method is that it allows the determination of the association rate 
between the PG-3 protein and molecules interacting with the PG-3 protein. It is thus possible to 
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select specifically ligand molecules interacting with the PG-3 protein, or a fragment thereof, 

through strong or conversely weak association constants. 

E. Candidate ligands obtained through a two-hybrid screening assay* 
The yeast two-hybrid system is designed to study protein-protein interactions in vivo (Fields 
5 and Song, 1 989), and relies upon the fusion of a bait protein to the DNA binding domain of the 

yeast Gal4 protein. This technique is also described in the US Patent N° US 5,667,973 and the US 

Patent N° 5,283,173. 

The general procedure of library screening by the two-hybrid assay may be performed as 
described by Harper et al (1993) or as described by Cho et al (1998) or also Fromont-Racine et al 
10 (1997). 

The bait protein or polypeptide consists of a PG-3 polypeptide or a fragment comprising a 
contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 3. 

More precisely, the nucleotide sequence encoding the PG-3 polypeptide or a fragment or 
1 5 variant thereof is fused to a polynucleotide encoding the DNA binding domain of the GAL4 protein, 
the fused nucleotide sequence being inserted in a suitable expression vector, for example pAS2 or 
pM3. 

Then, a human cDNA library is constructed in a specially designed vector, such that the 
human cDNA insert is fused to a nucleotide sequence in the vector that encodes the transcriptional 
20 domain oftheGAL4 protein. Preferably, the vector used is the pACT vector. The polypeptides 
encoded by the nucleotide inserts of the human cDNA library are termed "pray" polypeptides. 

A third vector contains a detectable marker gene, such as beta galactosidase gene or CAT 
gene that is placed under the control of a regulation sequence that is responsive to the binding of a 
complete Gal4 protein containing both the transcriptional activation domain and the DNA binding 
25 domain. For example, the vector pG5EC may be used. 

Two different yeast strains are also used. As an illustrative but non limiting example the 
two different yeast strains may be the followings : 

- Y190, the phenotype of which is (MATa, Leu2-3 t 112 ura3-12, trpl-901, his3-D200, ade2- 
101 gal4Dgall80D URA3 GAL-LacZ, LYSGAL-HIS3, cybT)\ 
30 - Yl 87, the phenotype of which is (MATa gal4 gal80 his3 trpl-901 ade2-101 ura3-52 leu2-3, 
-112 URA3 GAL-lacZmef), which is the opposite mating type of Y190. 
Briefly, 20 ug of pAS2/PG-3 and 20 ug of pACT-cDNA library are co-transformed into 
yeast strain Y190. The transformants are selected for growth on minimal media lacking histidine, 
leucine and tryptophan, but containing the histidine synthesis inhibitor 3-AT (50 mM). Positive 
35 colonies are screened for beta galactosidase by filter lift assay. The double positive colonies (His*, 
beta-gar) are then grown on plates lacking histidine, leucine, but containing tryptophan and 
cycloheximide (10 mg/ml) to select for loss of pAS2/PG-3 plasmids bu retention of pACT-cDNA 



WO 01/14550 PCTYIB00/01098 

97 

library plasmids. The resulting Yl 90 strains are mated with Yl 87 strains expressing PG-3 or non- 
related control proteins; such as cyclophilin B, lamin, or SNF1, as Gal4 fusions as described by 
Harper et al (1993) and by Bram et al (1993), and screened for beta galactosidase by filter lift 
assay. Yeast clones that are beta gal- after mating with the control Gal4 fusions are considered 
5 false positives. 

In another embodiment of the two-hybrid method according to the invention, interaction 
between the PG-3 or a fragment or variant thereof with cellular proteins may be assessed using the 
Matchmaker Two Hybrid System 2 (Catalog No. Kl 604-1 , Clontech). As described in the manual 
accompanying the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Clontech), nucleic acids 

10 encoding the PG-3 protein or a portion thereof, are inserted into an expression vector such that they are 
in frame with DNA encoding the DNA binding domain of the yeast transcriptional activator GAL4. A 
desired cDNA, preferably human cDNA, is inserted into a second expression vector such that they are 
in frame with DNA encoding the activation domain of GAL4. The two expression plasmids are 
transformed into yeast and the yeast are plated on selection medium which selects for expression of 

15 selectable markers on each of the expression vectors as well as GAL4 dependent expression of the 
HIS3 gene. Transformants capable of growing on medium lacking histidine are screened for GAL4 
dependent lacZ expression. Those cells which are positive in both the histidine selection and the lacZ 
assay contain interaction between PG-3 and the protein or peptide encoded by the initially selected 
cDNA insert. 

20 METHOD FOR SCREENING SUBSTANCES INTERACTING WITH THE 

REGULATORY SEQUENCES OF THE PG-3 GENE. 
The present invention also concerns a method for screening substances or molecules that 
are able to interact with the regulatory sequences of the PG-3 gene, such as for example promoter or 
enhancer sequences. 

25 Nucleic acids encoding proteins which are able to interact with the regulatory sequences of 

the PG-3 gene, more particularly a nucleotide sequence selected from the group consisting of the 
polynucleotides of the 5* and 3' regulatory region or a fragment or variant thereof, and preferably a 
variant comprising one of the biallelic markers of the invention, may be identified by using a one- 
hybrid system, such as that described in the booklet enclosed in the Matchmaker One-Hybrid 

30 System kit from Clontech (Catalog Ref. n° K 1603-1). Briefly, the target nucleotide sequence is 
cloned upstream of a selectable reporter sequence and the resulting DNA construct is integrated in 
the yeast genome (Saccharomyces cerevisiae). The yeast cells containing the reporter sequence in 
their genome are then transformed with a library consisting of fusion molecules between cDNAs 
encoding candidate proteins for binding onto the regulatory sequences of the PG-3 gene and 

35 sequences encoding the activator domain of a yeast transcription factor such as GAL4. The 
recombinant yeast cells are plated in a culture broth for selecting cells expressing the reporter 
sequence. The recombinant yeast cells thus selected contain a fusion protein that is able to bind 
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onto the target regulatory sequence of the PG-3 gene. Then, the cDNAs encoding the fusion 
proteins are sequenced and may be cloned into expression or transcription vectors in vitro. The 
binding of the encoded polypeptides to the target regulatory sequences of the PG-3 gene may be 
confirmed by techniques familiar to the one skilled in the art, such as gel retardation assays or 
5 DNAse protection assays. 

Gel retardation assays may also be performed independently in order to screen candidate 
molecules that are able to interact with the regulatory sequences of the PG-3 gene, such as described 
by Fried and Crothers (1981), Garner and Revzin (1981) and Dent and Latchman (1993). These 
techniques are based on the principle according to which a DNA fragment which is bound to a 
10 protein migrates slower than the same unbound DNA fragment. Briefly, the target nucleotide 

sequence is labeled. Then the labeled target nucleotide sequence is brought into contact with either 
a total nuclear extract from cells containing transcription factors, or with different candidate 
molecules to be tested. The interaction between the target regulatory sequence of the PG-3 gene 
and the candidate molecule or the transcription factor is detected after gel or capillary 
1 5 electrophoresis through a retardation in the migration. 

METHOD FOR SCREENING LIGANDS THAT MODULATE THE EXPRESSION 

OF THE PG-3 GENE. 
Another subject of the present invention is a method for screening molecules that modulate 
the expression of the PG-3 protein. Such a screening method comprises the steps of: 
20 a) cultivating a prokaryotic or an eukaryotic cell that has been transfected with a 

nucleotide sequence encoding the PG-3 protein or a variant or a fragment thereof, placed 
under the control of its own promoter; 

b) bringing into contact the cultivated cell with a molecule to be tested; 

c) quantifying the expression of the PG-3 protein or a variant or a fragment thereof. 
25 In an embodiment, the nucleotide sequence encoding the PG-3 protein or a variant or a 

fragment thereof comprises an allele of at least one of the biallelic markers Al to A80, and the 
complements thereof. 

Using DNA recombination techniques well known by the one skill in the art, the PG-3 
protein encoding DNA sequence is inserted into an expression vector, downstream from its 
30 promoter sequence. As an illustrative example, the promoter sequence of the PG-3 gene is 
contained in the nucleic acid of the 5' regulatory region. 

The quantification of the expression of the PG-3 protein may be realized either at the 
mRNA level or at the protein level. In the latter case, polyclonal or monoclonal antibodies may be 
used to quantify the amounts of the PG-3 protein that have been produced, for example in an ELISA 
35 or a R1A assay. 
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In a preferred embodiment, the quantification of the PG-3 mRNA is realized by a 
quantitative PCR amplification of the cDNA obtained by a reverse transcription of the total mRNA 
of the cultivated PG-3 -transfected host cell, using a pair of primers specific for PG-3. 

The present invention also concerns a method for screening substances or molecules that 
5 are able to increase, or in contrast to decrease, the level of expression of the PG-3 gene. Such a 
method may allow the one skilled in the art to select substances exerting a regulating effect on the 
expression level of the PG-3 gene and which may be useful as active ingredients included in 
pharmaceutical compositions for treating patients suffering from cancer. 

Thus, another aspect of the present invention is a method for screening a candidate 
10 substance or molecule for the ability to modulate the expression of the PG-3 gene, comprising the 
following steps: 

a) providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid 
comprises a nucleotide sequence of the 5* regulatory region or a biologically active fragment or 
variant thereof located upstream of a polynucleotide encoding a detectable protein; 
15 b) obtaining a candidate substance; and 

c) determining the ability of the candidate substance to modulate the expression levels of 
the polynucleotide encoding the detectable protein. 

In a further embodiment, the nucleic acid comprising the nucleotide sequence of the 5* 
regulatory region or a biologically active fragment or variant thereof also includes a 5TJTR region 
20 of the PG-3 cDNA of SEQ ID No 2, or one of its biologically active fragments or variants thereof. 

Among the preferred polynucleotides encoding a detectable protein, there may be cited 
polynucleotides encoding beta galactosidase, green fluorescent protein (GFP) and chloramphenicol 
acetyl transferase (CAT). 

The invention also pertains to kits useful for performing the herein described screening 
25 method. Preferably, such kits comprise a recombinant vector that allows the expression of a 

nucleotide sequence of the 5* regulatory region or a biologically active fragment or variant thereof 
located upstream and operably linked to a polynucleotide encoding a detectable protein or the PG-3 
protein or a fragment or a variant thereof. 

In another embodiment of a method for the screening of a candidate substance or molecule 
30 for the ability to modulate the expression of the PG-3 gene, the method comprises the following 
steps: 

a) providing a recombinant host cell containing a nucleic acid, wherein said nucleic acid 
comprises a 5TJTR sequence of the PG-3 cDNA of SEQ ID No 2, or one of its biologically active 
fragments or variants, the 5TFTR sequence or its biologically active fragment or variant being 

35 operably linked to a polynucleotide encoding a detectable protein; 

b) obtaining a candidate substance; and 



WO 01/14550 PCT7IB00/01098 

100 

c) determining the ability of the candidate substance to modulate the expression levels of 
the polynucleotide encoding the detectable protein. 

In a specific embodiment of the above screening method, the nucleic acid that comprises a 
nucleotide sequence selected from the group consisting of the 5TJTR sequence of the PG-3 cDNA 

5 of SEQ ID No 2 or one of its biologically active fragments or variants, includes a promoter 
sequence which is endogenous with respect to the PG-3 5TJTR sequence. 

In another specific embodiment of the above screening method, the nucleic acid that 
comprises a nucleotide sequence selected from the group consisting of the 5TJTR sequence of the 
PG-3 cDNA of SEQ ID No 2 or one of its biologically active fragments or variants, includes a 

10 promoter sequence which is exogenous with respect to the PG-3 5TTTR sequence defined therein. 

In a further preferred embodiment, the nucleic acid comprising the 5-UTR sequence of the 
PG-3 cDNA or SEQ ID No 2 or the biologically active fragments thereof includes a biallelic marker 
selected from the group consisting of Al to A80 or the complements thereof. 

The invention further encompasses a kit for the screening of a candidate substance for the 

15 ability to modulate the expression of the PG-3 gene, wherein said kit comprises a recombinant 
vector that comprises a nucleic acid including a 5XJTR sequence of the PG-3 cDNA of SEQ ID No 
2, or one of their biologically active fragments or variants, the 5TJTR sequence or its biologically 
active fragment or variant being operably linked to a polynucleotide encoding a detectable protein. 
For the design of suitable recombinant vectors useful for performing the screening methods 

20 described above, the section of the present specification wherein the preferred recombinant vectors 
of the invention are detailed is pertinent. 

Expression levels and patterns of PG-3 may be analyzed by solution hybridization with long 
probes as described in International Patent Application No. WO 97/05277. Briefly, the PG-3 cDNA 
or the PG-3 genomic DNA described above, or fragments thereof, is inserted at a cloning site 

25 immediately downstream of a bacteriophage (T3, T7 or SP6) RNA polymerase promoter to produce 
antisense RNA. Preferably, the PG-3 insert comprises at least 100 or more consecutive nucleotides 
of the genomic DNA sequence or the cDNA sequences. The plasmid is linearized and transcribed 
in the presence of ribonucleotides comprising modified ribonucleotides (i.e. biotin-UTP and DIG- 
UTP). An excess of this doubly labeled RNA is hybridized in solution with mRNA isolated from 

30 cells or tissues of interest. The hybridization is performed under standard stringent conditions (40- 
50°C for 16 hours in an 80% formamide, 0. 4 M NaCI buffer, pH 7-8). The unhybridized probe is 
removed by digestion with ribonucleases specific for single-stranded RNA (i.e. RNases CL3, Tl, 
Phy M, U2 or A). The presence of the biotin-UTP modification enables capture of the hybrid on a 
microtitration plate coated with streptavidin. The presence of the DIG modification enables the 

35 hybrid to be detected and quantified by ELISA using an anti-DIG antibody coupled to alkaline 
phosphatase. 
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Quantitative analysis of PG-3 gene expression may also be performed using arrays. As 
used herein, the term array means a one dimensional, two dimensional, or multidimensional 
arrangement of a plurality of nucleic acids of sufficient length to permit specific detection of 
expression of mRNAs capable of hybridizing thereto. For example, the arrays may contain a 
5 plurality of nucleic acids derived from genes whose expression levels are to be assessed. The arrays 
may include the PG-3 genomic DNA, the PG-3 cDNA sequences or the sequences complementary 
thereto or fragments thereof, particularly those comprising at least one of the biallelic markers 
according the present invention, preferably at least one of the biallelic markers Al to A80. 
Preferably, the fragments are at least 15 nucleotides in length. In other embodiments, the fragments 

10 are at least 25 nucleotides in length. In some embodiments, the fragments are at least 50 

nucleotides in length. More preferably, the fragments are at least 100 nucleotides in length. In 
another preferred embodiment, the fragments are more than 100 nucleotides in length. In some 
embodiments the fragments may be more than 500 nucleotides in length. 

For example, quantitative analysis of PG-3 gene expression may be performed with a 

15 complementary DNA microarray as described by Schena et a/.(1995 and 1996). Full length PG-3 
cDNAs or fragments thereof are amplified by PCR and arrayed from a 96-well microtiter plate onto 
silylated microscope slides using high-speed robotics. Printed arrays are incubated in a humid 
chamber to allow rehydration of the array elements and rinsed, once in 0. 2% SDS for 1 min, twice 
in water for 1 min and once for 5 min in sodium borohydride solution. The arrays are submerged in 

20 water for 2 min at 95°C, transferred into 0. 2% SDS for 1 min, rinsed twice with water, air dried and 
stored in the dark at 25°C. 

Cell or tissue mRNA is isolated or commercially obtained and probes are prepared by a 
single round of reverse transcription. Probes are hybridized to 1 cm 2 microarrays under a 14 x 14 
mm glass coverslip for 6-12 hours at 60°C. Arrays are washed for 5 min at 25°C in low stringency 

25 wash buffer (IX SSC/0. 2% SDS), then for 10 min at room temperature in high stringency wash 
buffer (0. IX SSC/0. 2% SDS). Arrays are scanned in 0. IX SSC using a fluorescence laser 
scanning device fitted with a custom filter set. Accurate differential expression measurements are 
obtained by taking the average of the ratios of two independent hybridizations. 

Quantitative analysis of PG-3 gene expression may also be performed with full length PG-3 

30 cDNAs or fragments thereof in complementary DNA arrays as described by Pietu et al. (1996). The 
full length PG-3 cDNA or fragments thereof is PCR amplified and spotted on membranes. Then, 
mRNAs originating from various tissues or cells are labeled with radioactive nucleotides. After 
hybridization and washing in controlled conditions, the hybridized mRNAs are detected by 
phospho-imaging or autoradiography. Duplicate experiments are performed and a quantitative 

35 analysis of differentially expressed mRNAs is then performed. 

Alternatively, expression analysis using the PG-3 genomic DNA, the PG-3 cDNA, or 
fragments thereof can be done through high density nucleotide arrays as described by Lockhart et 
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a/.(1996) and Sosnowski et a/.(1997). Oligonucleotides of 15-50 nucleotides from the sequences of 
the PG-3 genomic DNA, the PG-3 cDNA sequences particularly those comprising at least one of 
biallelic markers according the present invention, preferably at least one biallelic marker selected 
from the group consisting of Al to A80, or the sequences complementary thereto, are synthesized 
5 directly on the chip (Lockhart et aL> supra) or synthesized and then addressed to the chip 
(Sosnowski et al. y supra). Preferably, the oligonucleotides are about 20 nucleotides in length. 

PG-3 cDNA probes labeled with an appropriate compound, such as biotin, digoxigenin or 
fluorescent dye, are synthesized from the appropriate mRNA population and then randomly 
fragmented to an average size of 50 to 100 nucleotides. The said probes are then hybridized to the 

10 chip. After washing as described in Lockhart et a/ M supra and application of different electric fields 
(Sosnowski et aL, 1997), the dyes or labeling compounds are detected and quantified. Duplicate 
hybridizations are performed. Comparative analysis of the intensity of the signal originating from 
cDNA probes on the same target oligonucleotide in different cDNA samples indicates a differential 
expression of PG-3 mRNA. 

15 METHODS FOR INHIBITING THE EXPRESSION OF A PG-3 GENE 

Other therapeutic compositions according to the present invention comprise advantageously 
an oligonucleotide fragment of the nucleic sequence of PG-3 as an antisense tool or a triple helix 
tool that inhibits the expression of the corresponding PG-3 gene. A preferred fragment of the 
nucleic sequence of PG-3 comprises an allele of at least one of the biallelic markers Al to A80. 

20 Antisense Approach 

Preferred methods using antisense polynucleotide according to the present invention are the 
procedures described by Sczakiel et a/.(1995). 

Preferably, the antisense tools are chosen among the polynucleotides (15-200 bp long) that 
are complementary to the 5'end of the PG-3 mRNA. In another embodiment, a combination of 

25 different antisense polynucleotides complementary to different parts of the desired targeted gene are 
used. 

Preferred antisense polynucleotides according to the present invention are complementary 
to a sequence of the mRNAs of PG-3 that contains either the translation initiation codon ATG or a 
splicing donor or acceptor site. 

30 The antisense nucleic acids should have a length and melting temperature sufficient to 

permit formation of an intracellular duplex having sufficient stability to inhibit the expression of the 
PG-3 mRNA in the duplex. Strategies for designing antisense nucleic acids suitable for use in gene 
therapy are disclosed in Green et a/., (1986) and Izant and Weintraub, (1984). 

In some strategies, antisense molecules are obtained by reversing the orientation of the PG- 

35 3 coding region with respect to a promoter so as to transcribe the opposite strand from that which is 
normally transcribed in the cell. The antisense molecules may be transcribed using in vitro 
transcription systems such as those which employ T7 or SP6 polymerase to generate the transcript. 
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Another approach involves transcription of PG-3 antisense nucleic acids in vivo by operably linking 
DNA containing the antisense sequence to a promoter in a suitable expression vector. 

Alternatively, suitable antisense strategies are those described by Rossi et a/.(1991), in the 
International Applications Nos. WO 94/23026, WO 95/04141, WO 92/18522 and in the European 
5 Patent Application No. EP 0 572 287 A2. 

An alternative to the antisense technology that is used according to the present invention 
consists in using ribozymes that will bind to a target sequence via their complementary 
polynucleotide tail and that will cleave the corresponding RNA by hydrolyzing its target site 
(namely "hammerhead ribozymes"). Briefly, the simplified cycle of a hammerhead ribozyme 
10 consists of (1) sequence specific binding to the target RNA via complementary antisense sequences; 
(2) site-specific hydrolysis of the cleavable motif of the target strand; and (3) release of cleavage 
products, which gives rise to another catalytic cycle. Indeed, the use of long-chain antisense 
polynucleotide (at least 30 bases long) or ribozymes with long antisense arms are advantageous. A 
preferred delivery system for antisense ribozyme is achieved by covalently linking these antisense 
15 ribozymes to lipophilic groups or to use liposomes as a convenient vector. Preferred antisense 
ribozymes according to the present invention are prepared as described by Sczakdel et al. (1995). 

Triple Helix Approach 

The PG-3 genomic DNA may also be used to inhibit the expression of the PG-3 gene based 
on intracellular triple helix formation. 
20 Triple helix oligonucleotides are used to inhibit transcription from a genome. They are 

particularly useful for studying alterations in cell activity when it is associated with a particular 
gene. 

Similarly, a portion of the PG-3 genomic DNA can be used to study the effect of inhibiting 
PG-3 transcription within a cell. Traditionally, homopurine sequences were considered the most 

25 useful for triple helix strategies. However, homopyrimidine sequences can also inhibit gene 
expression. Such homopyrimidine oligonucleotides bind to the major groove at 
homopurine:homopyrimidine sequences. Thus, both types of sequences from the PG-3 genomic 
DNA are contemplated within the scope of this invention. 

To carry out gene therapy strategies using the triple helix approach, the sequences of the 

30 PG-3 genomic DNA are first scanned to identify 10-mer to 20-mer homopyrimidine or homopurine 
stretches which could be used in triple-helix based strategies for inhibiting PG-3 expression. 
Following identification of candidate homopyrimidine or homopurine stretches, their efficiency in 
inhibiting PG-3 expression is assessed by introducing varying amounts of oligonucleotides 
containing the candidate sequences into tissue culture cells which express the PG-3 gene. 

35 The oligonucleotides can be introduced into the cells using a variety of methods known to 

those skilled in the art, including but not limited to calcium phosphate precipitation, DEAE- 
Dextran, electroporation, liposome-mediated transfection or native uptake. 
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Treated cells are monitored for altered cell function or reduced PG-3 expression using 
techniques such as Northern blotting, RNase protection assays, or PCR based strategies to monitor 
the transcription levels of the PG-3 gene in cells which have been treated with the oligonucleotide. 

The oligonucleotides which are effective in inhibiting gene expression in tissue culture cells 
5 may then be introduced in vivo using the techniques described above in the antisense approach at a 
dosage calculated based on the in vitro results, as described in antisense approach. 

In some embodiments, the natural (beta) anomers of the oligonucleotide units can be 
replaced with alpha anomers to render the oligonucleotide more resistant to nucleases. Further, an 
intercalating agent such as ethidium bromide, or the like, can be attached to the 3' end of the alpha 
10 oligonucleotide to stabilize the triple helix. For information on the generation of oligonucleotides 
suitable for triple helix formation see Griffin et a/.(1989), which is hereby incorporated by this 
reference. 

COMPUTER-RELATED EMBODIMENTS 

As used herein the term "nucleic acid codes of the invention" encompass the nucleotide 

1 5 sequences comprising, consisting essentially of, or consisting of any one of the following: a) a 

contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 
1000 nucleotides of SEQ ID No 1, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 
of the following nucleotide positions ofSEQ ID No 1: 1-97921,98517-103471, 103603-108222, 
108390-109221, 109324-114409, 114538-115723, 115957-122102, 122225-126876, 127033- 

20 157212, 157808-240825; b) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 
80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the complements thereof; and, 
c) a nucleotide sequence complementary to any one of the preceding nucleotide sequences. 

The "nucleic acid codes of the invention" further encompass nucleotide sequences 
homologous to: a) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 

25 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1, wherein said contiguous span comprises 
at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1 : 1-97921, 98517- 
103471, 103603-108222, 108390-109221, 109324-114409, 114538-115723, 115957-122102, 
122225-126876, 127033-157212, 157808-240825; b) a contiguous span of at least 12, 15, 18, 20, 
25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the 

30 complements thereof; and, c) sequences complementary to all of the preceding sequences. 

Homologous sequences refer to a sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 
80%, or 75% homology to these contiguous spans. Homology may be determined using any method 
described herein, including BLAST2N with the default parameters or with any modified parameters. 
Homologous sequences also may include RNA sequences in which uridines replace the thymines in the 

35 nucleic acid codes of the invention. It will be appreciated that the nucleic acid codes of the invention 
can be represented in the traditional single character format (See the inside back cover of Stryer, 
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Lubert. 1995) or in any other format or code which records the identity of the nucleotides in a 
sequence. 

As used herein the term "polypeptide codes of the invention" encompass the polypeptide 
sequences comprising a contiguous span of at least 6, 8, 10, 12, 15, 20, 25, 30, 40, 50, or 100 amino 
5 acids of SEQ ED No 3. It will be appreciated that the polypeptide codes of the invention can be 
represented in the traditional single character format or three letter format (See the inside back cover of 
Stryer, Lubert.) or in any other format or code which records the identity of the polypeptides in a 
sequence. 

It will be appreciated by those skilled in the art that the nucleic acid codes of the invention 
1 0 and polypeptide codes of the invention can be stored, recorded, and manipulated on any medium 
which can be read and accessed by a computer. As used herein, the words "recorded" and "stored" 
refer to a process for storing information on a computer medium. A skilled artisan can readily adopt 
any of the presently known methods for recording information on a computer readable medium to 
generate manufactures comprising one or more of the nucleic acid codes of the invention, or one or 
1 5 more of the polypeptide codes of the invention. Another aspect of the present invention is a computer 
readable medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 nucleic acid codes of 
the invention. Another aspect of the present invention is a computer readable medium having recorded 
thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 polypeptide codes of the invention. 

Computer readable media include magnetically readable media, optically readable media, 
20 electronically readable media and magnetic/optical media. For example, the computer readable media 
may be a hard disk, a floppy disk, a magnetic tape, CD-ROM, Digital Versatile Disk (DVD), Random 
Access Memory (RAM), or Read Only Memory (ROM) as well as other types of other media known to 
those skilled in the art. 

Embodiments of the present invention include systems, particularly computer systems which 
25 store and manipulate the sequence information described herein. One example of a computer system 
1 00 is illustrated in block diagram form in Figure 1 . As used herein, "a computer system" refers to the 
hardware components, software components, and data storage components used to analyze the 
nucleotide sequences of the nucleic acid codes of the invention or the amino acid sequences of the 
polypeptide codes of the invention. In one embodiment, the computer system 100 is a Sun Enterprise 
30 1000 server (Sun Microsystems, Palo Alto, CA). The computer system 100 preferably includes a 

processor for processing, accessing and manipulating the sequence data. The processor 105 can be any 
well-known type of central processing unit, such as the Pentium HI from Intel Corporation, or similar 
processor from Sun, Motorola, Compaq or International Business Machines. 

Preferably, the computer system 100 is a general purpose system that comprises the processor 
35 1 05 and one or more internal data storage components 1 10 for storing data, and one or more data 
retrieving devices for retrieving the data stored on the data storage components. A skilled artisan can 
readily appreciate that any one of the currendy available computer systems are suitable. 



WO 01/14550 PCT/IB00/01098 

106 

In one particular embodiment, the computer system 100 includes a processor 105 connected to 
a bus which is connected to a main memory 1 15 (preferably implemented as RAM) and one or more 
internal data storage devices 1 10, such as a hard drive and/or other computer readable media having 
data recorded thereon. In some embodiments, the computer system 100 further includes one or more 
5 data retrieving device 1 1 8 for reading the data stored on the internal data storage devices 110. 

The data retrieving device 118 may represent, for example, a floppy disk drive, a compact disk 
drive, a magnetic tape drive, etc. In some embodiments, the internal data storage device 1 10 is a 
removable computer readable medium such as a floppy disk, a compact disk, a magnetic tape, etc. 
containing control logic and/or data recorded thereon. The computer system 100 may advantageously 

1 0 include or be programmed by appropriate software for reading the control logic and/or the data from 
the data storage component once inserted in the data retrieving device. 

The computer system 100 includes a display 120 which is used to display output to a computer 
user. It should also be noted that the computer system 1 00 can be linked to other computer systems 
125a-c in a network or wide area network to provide centralized access to the computer system 100. 

1 5 Software for accessing and processing the nucleotide sequences of the nucleic acid codes of 

the invention or the amino acid sequences of the polypeptide codes of the invention (such as search 
tools, compare tools, and modeling tools etc.) may reside in main memory 115 during execution. 

In some embodiments, the computer system 100 may further comprise a sequence comparer 
for comparing the above-described nucleic acid codes of the invention or the polypeptide codes of the 

20 invention stored on a computer readable medium to reference nucleotide or polypeptide sequences 
stored on a computer readable medium. A "sequence comparer" refers to one or more programs which 
are implemented on the computer system 100 to compare a nucleotide or polypeptide sequence with 
other nucleotide or polypeptide sequences and/or compounds including but not limited to peptides, 
peptidomimetics, and chemicals stored within the data storage means. For example, the sequence 

25 comparer may compare the nucleotide sequences of nucleic acid codes of the invention or the amino 
acid sequences of the polypeptide codes of the invention stored on a computer readable medium to 
reference sequences stored on a computer readable medium to identify homologies, motifs implicated 
in biological function, or structural motifs. The various sequence comparer programs identified 
elsewhere in this patent specification are particularly contemplated for use in this aspect of the 

30 invention. 

Figure 2 is a flow diagram illustrating one embodiment of a process 200 for comparing a new 
nucleotide or protein sequence with a database of sequences in order to determine the homology levels 
between the new sequence and the sequences in the database. The database of sequences can be a 
private database stored within the computer system 100, or a public database such as GENBANK, PER 
35 OR SWISSPROT that is available through the Internet. 
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The process 200 begins at a start state 201 and then moves to a state 202 wherein the new 
sequence to be compared is stored to a memory in a computer system 100. As discussed above, the 
memory could be any type of memory, including RAM or an internal storage device. 

The process 200 then moves to a state 204 wherein a database of sequences is opened for 

5 analysis and comparison. The process 200 then moves to a state 206 wherein the first sequence stored 
in the database is read into a memory on the computer. A comparison is then performed at a state 210 
to determine if the first sequence is the same as the second sequence. It is important to note that this 
step is not limited to performing an exact comparison between the new sequence and the first sequence 
in the database. Well-known methods are known to those of skill in the art for comparing two 

10 nucleotide or protein sequences, even if they are not identical. For example, gaps can be introduced 
into one sequence in order to raise the homology level between the two tested sequences. The 
parameters that control whether gaps or other features are introduced into a sequence during 
comparison are normally entered by the user of the computer system. 

Once a comparison of the two sequences has been performed at the state 2 1 0, a determination 

15 is made at a decision state 2 1 0 whether the two sequences are the same. Of course, the term "same" is 
not limited to sequences that are absolutely identical. Sequences that are within the homology 
parameters entered by the user will be marked as "same" in the process 200. 

If a determination is made that the two sequences are the same, the process 200 moves to a 
state 214 wherein the name of the sequence from the database is displayed to the user. This state 

20 notifies the user that the sequence with the displayed name fulfills the homology constraints that were 
entered. Once the name of the stored sequence is displayed to the user, the process 200 moves to a 
decision state 218 wherein a determination is made whether more sequences exist in the database. If no 
more sequences exist in the database, then the process 200 terminates at an end state 220. However, if 
more sequences do exist in the database, then the process 200 moves to a state 224 wherein a pointer is 

25 moved to the next sequence in the database so that it can be compared to the new sequence. Li this 
manner, the new sequence is aligned and compared with every sequence in the database. 

It should be noted that if a determination had been made at the decision state 212 that the 
sequences were not homologous, then the process 200 would move immediately to the decision state 
218 in order to determine if any other sequences were available in the database for comparison. 

30 Accordingly, one aspect of the present invention is a computer system comprising a 

processor, a data storage device having stored thereon a nucleic acid code of the invention or a 
polypeptide code of the invention, a data storage device having retrievably stored thereon reference 
nucleotide sequences or polypeptide sequences to be compared to the nucleic acid code of the 
invention or polypeptide code of the invention and a sequence comparer for conducting the 

35 comparison. The sequence comparer may indicate a homology level between the sequences 
compared or identify structural motifs in the nucleic acid code of the invention and polypeptide 
codes of the invention or it may identify structural motifs in sequences which are compared to these 
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nucleic acid codes and polypeptide codes. In s me embodiments, the data storage device may have 
stored thereon the sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the 
invention or polypeptide codes of the invention. 

Another aspect of the present invention is a method for determining the level of homology 
5 between a nucleic acid code of the invention and a reference nucleotide sequence, comprising the 
steps of reading the nucleic acid code and the reference nucleotide sequence through the use of a 
computer program which determines homology levels and determining homology between the nucleic 
acid code and the reference nucleotide sequence with the computer program. The computer program 
may be any of a number of computer programs for determining homology levels, including those 

1 0 specifically enumerated herein, including BLAST2N with the default parameters or with any modified 
parameters. The method may be implemented using the computer systems described above. The 
method may also be performed by reading 2, 5, 10, 15, 20, 25, 30, or 50 of the above described nucleic 
acid codes of the invention through the use of the computer program and determining homology 
between the nucleic acid codes and reference nucleotide sequences. 

15 Figure 3 is a flow diagram illustrating one embodiment of a process 250 in a computer for 

determining whether two sequences are homologous. The process 250 begins at a start state 252 and 
then moves to a state 254 wherein a first sequence to be compared is stored to a memory. The 
second sequence to be compared is then stored to a memory at a state 256. The process 250 then 
moves to a state 260 wherein the first character in the first sequence is read and then to a state 262 

20 wherein the first character of the second sequence is read. It should be understood that if the 

sequence is a nucleotide sequence, then the character would normally be either A, T, C, G or U. If 
the sequence is a protein sequence, then it should be in the single letter amino acid code so that the 
first and sequence sequences can be easily compared. 

A determination is then made at a decision state 264 whether the two characters are the 

25 same. If they are the same, then the process 250 moves to a state 268 wherein the next characters in 
the first and second sequences are read. A determination is then made whether the next characters 
are the same. If they are, then the process 250 continues this loop until two characters are not the 
same. If a determination is made that the next two characters are not the same, the process 250 
moves to a decision state 274 to determine whether there are any more characters either sequence to 

30 read. 

If there aren't any more characters to read, then the process 250 moves to a state 276 
wherein the level of homology between the first and second sequences is displayed to the user. The 
level of homology is determined by calculating the proportion of characters between the sequences 
that were the same out of the total number of sequences in the first sequence. Thus, if every 
35 character in a first 100 nucleotide sequence aligned with a every character in a second sequence, the 
homology level would be 100%. 
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Alternatively, the computer program may be a computer program which compares the 
nucleotide sequences of the nucleic acid codes of the present invention, to reference nucleotide 
sequences in order to determine whether the nucleic acid code of the invention differs from a reference 
nucleic acid sequence at one or more positions. Optionally such a program records the length and 

5 identity of inserted, deleted or substituted nucleotides with respect to the sequence of either the 
reference polynucleotide or the nucleic acid code of the invention. In one embodiment, the computer 
program may be a program which determines whether the nucleotide sequences of the nucleic acid 
codes of the invention contain one or more single nucleotide polymorphisms (SNP) with respect to a 
reference nucleotide sequence. These single nucleotide polymorphisms may each comprise a single 

10 base substitution, insertion, or deletion. 

Another aspect of the present invention is a method for determining the level of homology 
between a polypeptide code of the invention and a reference polypeptide sequence, comprising the 
steps of reading the polypeptide code of the invention and the reference polypeptide sequence through 
use of a computer program which determines homology levels and determining homology between the 

15 polypeptide code and the reference polypeptide sequence using the computer program. 

Accordingly, another aspect of the present invention is a method for determining whether a 
nucleic acid code of the invention differs at one or more nucleotides from a reference nucleotide 
sequence comprising the steps of reading the nucleic acid code and the reference nucleotide 
sequence through use of a computer program which identifies differences between nucleic acid 

20 sequences and identifying differences between the nucleic acid code and the reference nucleotide 
sequence with the computer program. In some embodiments, the computer program is a program 
which identifies single nucleotide polymorphisms The method may be implemented by the 
computer systems described above and the method illustrated in Figure 3. The method may also be 
performed by reading at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the 

25 invention and the reference nucleotide sequences through the use of the computer program and 
identifying differences between the nucleic acid codes and the reference nucleotide sequences with 
the computer program. 

In other embodiments the computer based system may further comprise an identifier for 
identifying features within the nucleotide sequences of the nucleic acid codes of the invention or the 

30 amino acid sequences of the polypeptide codes of the invention. 

An "identifier" refers to one or more programs which identifies certain features within the 
above-described nucleotide sequences of the nucleic acid codes of the invention or the amino acid 
sequences of the polypeptide codes of the invention. In one embodiment, the identifier may 
comprise a program which identifies an open reading frame in the cDNAs codes of the invention. 

35 Figure 4 is a flow diagram illustrating one embodiment of an identifier process 300 for 

detecting the presence of a feature in a sequence. The process 300 begins at a start state 302 and 
then moves to a state 304 wherein a first sequence that is to be checked for features is stored to a 
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memory 1 15 in the computer system 100. The process 300 then moves to a state 306 wherein a 
database of sequence features is opened. Such a database would include a list of each feature's 
attributes along with the name of the feature. For example, a feature name could be "Initiation 
Codon" and the attribute would be W ATG" Another example would be the feature name 
5 "TAATAA Box" and the feature attribute would be "TAATAA". An example of such a database is 
produced by the University of Wisconsin Genetics Computer Group (www.gcg.com). 

Once the database of features is opened at the state 306, the process 300 moves to a state 
308 wherein the first feature is read from the database. A comparison of the attribute of the first 
feature with the first sequence is then made at a state 3 10. A determination is then made at a 

1 0 decision state 316 whether the attribute of the feature was found in the first sequence. If the 
attribute was found, then the process 300 moves to a state 318 wherein the name of the found 
feature is displayed to the user. 

The process 300 then moves to a decision state 320 wherein a determination is made 
whether move features exist in the database. If no more features do exist, then the process 300 

15 terminates at an end state 324. However, if more features do exist in the database, then the process 
300 reads the next sequence feature at a state 326 and loops back to the state 310 wherein the 
attribute of the next feature is compared against the first sequence. 

It should be noted, that if the feature attribute is not found in the first sequence at the 
decision state 316, the process 300 moves directly to the decision state 320 in order to determine if 

20 any more features exist in the database. 

In another embodiment, the identifier may comprise a molecular modeling program which 
determines the 3-dimensional structure of the polypeptides codes of the invention. Li some 
embodiments, the molecular modeling program identifies target sequences that are most compatible 
with profiles representing the structural environments of the residues in known three-dimensional 

25 protein structures. (See, e.g., Eisenberg et aL, U.S. Patent No. 5,436,850 issued July 25, 1995). In 
another technique, the known three-dimensional structures of proteins in a given family are 
superimposed to define the structurally conserved regions in that family. This protein modeling 
technique also uses the known three-dimensional structure of a homologous protein to approximate 
the structure of the polypeptide codes of the invention. (See e.g., Srinivasan, et aL, U.S. Patent 

30 No. 5,557,535 issued September 17, 1996). Conventional homology modeling techniques have 
been used routinely to build models of proteases and antibodies. (Sowdhamini et al y (1997)). 
Comparative approaches can also be used to develop three-dimensional protein models when the 
protein of interest has poor sequence identity to template proteins. In some cases, proteins fold into 
similar three-dimensional structures despite having very weak sequence identities. For example, the 

35 three-dimensional structures of a number of helical cytokines fold in similar three-dimensional 
topology in spite of weak sequence homology. 
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The recent development of threading methods now enables the identification of likely 
folding patterns in a number of situations where the structural relatedness between target and 
template(s) is not detectable at the sequence level. Hybrid methods, in which fold recognition is 
performed using Multiple Sequence Threading (MST), structural equivalencies are deduced from 
5 the threading output using a distance geometry program DRAGON to construct a low resolution 
model, and a full-atom representation is constructed using a molecular modeling package such as 
QUANTA. 

According to this 3-step approach, candidate templates are first identified by using the 
novel fold recognition algorithm MST, which is capable of performing simultaneous threading of 

10 multiple aligned sequences onto one or more 3-D structures. In a second step, the structural 

equivalencies obtained from the MST output are converted into interresidue distance restraints and 
fed into the distance geometry program DRAGON, together with auxiliary information obtained 
from secondary structure predictions. The program combines the restraints in an unbiased manner 
and rapidly generates a large number of low resolution model confirmations. In a third step, these 

15 low resolution model confirmations are converted into full-atom models and subjected to energy 
minimization using the molecular modeling package QUANTA. (See e.g., Aszodi et al. y (1997)). 

The results of the molecular modeling analysis may then be used in rational drug design 
techniques to identify agents which modulate the activity of the polypeptide codes of the invention. 
Accordingly, another aspect of the present invention is a method of identifying a feature 

20 within the nucleic acid codes of the invention or the polypeptide codes of the invention comprising 
reading the nucleic acid code(s) or the polypeptide code(s) through the use of a computer program 
which identifies features therein and identifying features within the nucleic acid code(s) or 
polypeptide code(s) with the computer program. In one embodiment, computer program comprises 
a computer program which identifies open reading frames. In a further embodiment, the computer 

25 program identifies structural motifs in a polypeptide sequence. In another embodiment, the 
computer program comprises a molecular modeling program. The method may be performed by 
reading a single sequence or at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the 
invention or the polypeptide codes of the invention through the use of the computer program and 
identifying features within the nucleic acid codes or polypeptide codes with the computer program. 

30 The nucleic acid codes of the invention or the polypeptide codes of the invention may be 

stored and manipulated in a variety of data processor programs in a variety of formats. For example, 
they may be stored as text in a word processing file, such as MicrosoftWORD or WORDPERFECT or 
as an ASCII file in a variety of database programs familiar to those of skill in the art, such as DB2, 
SYBASE, or ORACLE. In addition, many computer programs and databases may be used as sequence 

35 comparers, identifiers, or sources of reference nucleotide or polypeptide sequences to be compared to 
the nucleic acid codes of the invention or the polypeptide codes of the invention. The following list 
is intended not to limit the invention but to provide guidance to programs and databases which are 
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useful with the nucleic acid codes of the invention or the polypeptide codes of the invention. The 
programs and databases which may be used include, but are not limited to: MacPattern (EMBL), 
DiscoveryBase (Molecular Applications Group), GeneMine (Molecular Applications Group), Look 
(Molecular Applications Group), MacLook (Molecular Applications Group), BLAST and BLAST2 

5 (NCBI), BLASTN and BLASTX (Altschul et al, 1990), FASTA (Pearson and Lipman, 1988), 
FASTDB (Brutlag et al. $ 1990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular 
Simulations Inc.), Cerius 2 .DBAccess (Molecular Simulations Inc.), HypoGen (Molecular Simulations 
Inc.), Insight II, (Molecular Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm 
(Molecular Simulations Inc.), Felix (Molecular Simulations Inc.), DelPhi, (Molecular Simulations Inc.), 

10 QuanteMM, (Molecular Simulations Inc.), Homology (Molecular Simulations Inc.), Modeler 

(Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular 
Simulations Inc.), WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular 
Simulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFold (Molecular Simulations Inc.), 
the EMBL/Swissprotein database, the MDL Available Chemicals Directory database, the MDL Drug 

1 5 Data Report data base, the Comprehensive Medicinal Chemistry database, Derwents's World Drug 
Index database, the BioByteMasterFile database, the Genbank database, the Genseqn database and the 
Genseqp databases. Many other programs and data bases would be apparent to one of skill in the art 
given the present disclosure. 

Motifs which may be detected using the above programs include sequences encoding 

20 leucine zippers, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices, and 
beta sheets, signal sequences encoding signal peptides which direct the secretion of the encoded 
proteins, sequences implicated in transcription regulation such as homeoboxes, acidic stretches, 
enzymatic active sites, substrate binding sites, and enzymatic cleavage sites. 

25 Throughout this application, various publications, patents and published patent applications 

are cited. The disclosures of these publications, patents and published patent specification 
referenced in this application are hereby incorporated by reference into the present disclosure to 
more fully describe the sate of the art to which this invention pertains. 

EXAMPLES 

30 EXAMPLE 1 

IDENTIFICATION OF BIALLELIC MARKERS - DNA EXTRACTION 
Donors were unrelated and healthy. They presented a sufficient diversity for being 
representative of a French heterogeneous population. The DNA from 100 individuals was extracted 
and tested for the detection of the biallelic markers. 
35 30 ml of peripheral venous blood were taken from each donor in the presence of EDTA. 

Cells (pellet) were collected after centrifugation for 10 minutes at 2000 rpm. Red cells were lysed 
by a lysis solution (50 ml final volume: 10 mM Tris pH7.6; 5 mM MgCl 2 ; 10 mM NaCl). The 
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solution was centrifuged (10 minutes, 2000 rpm) as many times as necessary to eliminate the 
residual red cells present in the supernatant, after resuspension of the pellet in the lysis solution. 

The pellet of white cells was lysed overnight at 42°C with 3.7 ml of lysis solution 
composed of: 

5 - 3 ml TE 1 0-2 (Tris-HCl 1 0 mM, EDTA 2 mM) / NaCl 0 4 M 

-200ul SDS 10% 

- 500 \il K-proteinase (2 mg K-proteinase in TE 10-2 / NaCl 0.4 M). 



For the extraction of proteins, 1 ml saturated NaCl (6M) (1/3.5 v/v) was added. After 
10 vigorous agitation, the solution was centrifuged for 20 minutes at 10000 rpm. 

For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were added to the previous 
supernatant, and the solution was centrifuged for 30 minutes at 2000 rpm. The DNA solution was 
rinsed three times with 70% ethanol to eliminate salts, and centrifuged for 20 minutes at 2000 rpm. 
The pellet was dried at 37°C, and resuspended in 1 ml TE 10-1 or 1 ml water. The DNA 
15 concentration was evaluated by measuring the OD at 260 nm ( 1 unit OD = 50 ug/ml DNA). 

To determine the presence of proteins in the DNA solution, the OD 260 / OD 280 ratio was 
determined. Only DNA preparations having a OD 260 / OD 280 ratio between 1 .8 and 2 were used 
in the subsequent examples described below. 

The pool was constituted by mixing equivalent quantities of DNA from each individual. 
20 EXAMPLE 2 

IDENTIFICATION OF BIALLELIC MARKERS: AMPLIFICATION OF GENOMIC 

DNA BY PCR 

The amplification of specific genomic sequences of the DNA samples of example 1 was 
carried out on the pool of DNA obtained previously. In addition, 50 individual samples were 
25 similarly amplified. 

PCR assays were performed using the following protocol: 



Final volume 


25 ul 


DNA 


2 ng/ul 


MgCl 2 


2mM 


dNTP (each) 


200 uM 


primer (each) 


2.9 ng/ul 


Ampli Taq Gold DNA polymerase 


0.0S unit/ul 


PCR buffer (lOx = 0.1 M TrisHCl pH8.3 0.5M KC1) 


lx 



35 Each pair of first primers was designed using the sequence information of the PG-3 gene 

disclosed herein and the OSP software (Hillier & Green, 1991). This first pair of primers was about 
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20 nucleotides in length and had the sequences disci sed in Table 1 in the columns labeled PU and 
RP. 



Table 1 



Amplicon 


Position range 
of the amplicon 
in SEQ ID No: 1 


PU 
primer 
name 


Position range 
of amplification 
primer in SEQ 
IDNo:l 


RP 
primer 
name 


Complementary 
position range of 
amplification 
primer in SEQ 
IDNo:l 


5-390 


1823 


2125 


Bl 


1823 


1840 


CI 


2108 


2125 


5-391 


4559 


4908 


B2 


4559 


4577 


C2 


4891 


4908 


5-392 


10007 


10430 


B3 


10007 


10025 


C3 


10411 


10430 


4-59 


39556 


39970 


B4 


39556 


39574 


C4 


39953 


39970 


4-58 


39877 


40259 


B5 


39877 


39896 


C5 


40242 


40259 


4-54 


41137 


41581 


B6 


41137 


41154 


C6 


41564 


41581 


4-51 


42122 


42543 


B7 


42122 


42141 


C7 


42526 


42543 


99-86 


67289 


67741 


B8 


67289 


67309 


C8 


67724 


67741 


4-88 




69626 


B9 


69182 


69200 


C9 


69609 


69626 


5-397 


72698 


73117 


B10 


72698 


72715 


CIO 


73099 

/ JV77 


73117 


5-398 


75858 


76306 


Bll 


75858 


75877 


Cll 


76289 


76306 


99-12738 


81006 


81485 


B12 


81006 


81025 


C12 


81466 


81485 


99-109 


83564 


84007 


B13 


83564 


83582 


C13 


83990 


84007 


99-12749 


91743 


92142 


B14 


91743 


91763 


C14 


92123 


92142 


4-21 


95196 


95619 


B15 


95196 


95214 


CIS 


95600 


95619 


4-23 


95865 


96229 


B16 


95865 


95882 


C16 


96210 


96229 


99-12753 


97261 


97747 


B17 


97261 


97278 


C17 


97728 


97747 


5-364 


97831 


98275 


B18 


97831 


97849 


C18 


98256 


98275 


99-12755 


98638 


99131 


B19 


98638 


98656 


C19 


99111 


99131 


4-87 


103376 


103818 


B20 


103376 


103395 


C20 


103801 


103818 


99-12757 


104081 


104636 


B21 


104081 


104100 


C21 


104619 


104636 


99-12758 


106272 


106799 


B22 


106272 


106291 


C22 


106780 


106799 


4-105 


108200 


108412 


B23 


108200 


108218 


C23 


108390 


108412 


4-45 


108223 


108520 


B24 


108223 


108246 


C24 


108499 


108520 


4-44 


109123 


109471 


B25 


109123 


109142 


C25 


109454 


109471 


4-86 


114217 


1 14663 


B26 


114217 


114234 


C26 


114646 


114663 


4-84 


115630 


116049 


B27 


115630 


115647 


C27 


116031 


116049 


99-78 


121991 


122401 


B28 


121991 


122011 


C28 


122384 


122401 


99-12767 


123089 


123583 


B29 


123089 


123106 


C29 


123565 


123583 


4-80 


126711 


127065 


B30 


126711 


126729 


C30 


127048 


127065 


4-36 


128162 


128590 


B31 


128162 


128179 


C31 


128573 


128590 


4-35 


128480 


128926 


B32 


128480 


128497 


C32 


128909 


128926 


99-12771 


130747 


131273 


B33 


130747 


130764 


C33 


131254 


131273 


99-12774 


132873 


133325 


B34 


132873 


132892 


C34 


133305 


133325 


99-12776 


135029 


135478 


B35 


135029 


135048 


C35 


135458 


135478 


99-12781 


139277 


139742 


B36 


139277 


139296 


C36 


139724 


139742 


4-104 


157181 


157832 


B37 


157181 


157199 


C37 


157814 


157832 


99-12818 


172692 


173091 


B38 


172692 


172709 


C38 


173072 


173091 
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99-24807 


180248 


180892 


B39 


180248 


180268 


C39 


180874 


180892 


99-12827 


184662 


185156 


B40 


184662 


184680 


C40 


185138 


185156 


99-12831 


190178 


190663 


B41 


190178 


190196 


C41 


190643 


190663 


99-12832 


191011 


191460 


B42 


191011 


191030 


C42 


191441 


191460 


99-12836 


195099 


195587 


B43 


195099 


195116 


C43 


195568 


195587 


99-12844 


203585 


204115 


B44 


203585 


203602 


C44 


204095 


204115 


4-24 


210079 


210495 


B45 


210079 


210096 


C45 


210476 


210495 


4-27 


210979 


211401 


B46 


210979 


210996 


C46 


211382 


211401 


5-400 


215852 


216271 


B47 


215852 


215870 


C47 


216253 


216271 


99-12852 


216213 


216728 


B48 


216213 


216231 


C48 


216708 


216728 


4-37 


221530 


221973 


B49 


221530 


221549 


C49 


221956 


221973 


5-270 


225554 


225845 


B50 


225554 


225572 


C50 


225827 


225845 


99-12860 


229341 


229790 


B51 


229341 


229359 


C51 


229770 


229790 


5-402 


237412 


237766 


B52 


237412 


237429 


C52 


237747 


237766 



Preferably, the primers contained a common oligonucleotide tail upstream of the specific 
bases targeted for amplification which was useful for sequencing. 

Primers PU contain the following additional PU 5' sequence: 
5 TGTAAAACGACGGCCAGT; primers RP contain the following RP 5* sequence: 

CAGGAAACAGCTATGACC. The primer containing the additional PU 5* sequence is listed in 
SEQ ID No 4. The primer containing the additional RP 5* sequence is listed in SEQ ID No 5. 

The synthesis of these primers was performed following the phosphoramidite method, on a 
GENSET UFPS 24.1 synthesizer. 
10 DNA amplification was performed on a Genius II thermocycler. After heating at 95°C for 

10 min, 40 cycles were performed. Each cycle comprised: 30 sec at 95°C, 54°C for 1 min, and 30 
sec at 72°C. For final elongation, 10 min at 72°C ended the amplification. The quantities of the 
amplification products obtained were determined on 96-well microtiter plates, using a fluorometer 
and Picogreen as intercalant agent (Molecular Probes). 
15 EXAMPLE 3 

IDENTIFICATION OF BIALLELIC MARKERS * SEQUENCING OF AMPLIFIED 
GENOMIC DNA AND IDENTIFICATION OF POLYMORPHISMS 
The sequencing of the amplified DNA obtained in example 2 was carried out on ABI 377 
sequencers. The sequences of the amplification products were determined using automated dideoxy 
20 terminator sequencing reactions with a dye terminator cycle sequencing protocol. The products of 
the sequencing reactions were run on sequencing gels and the sequences were determined using gel 
image analysis (ABI Prism DNA Sequencing Analysis software (2.12 version)). 

The sequence data were further evaluated to detect the presence of biallelic markers within 
the amplified fragments. The polymorphism search was based on the presence of superimposed 
25 peaks in the electrophoresis pattern resulting from different bases occurring at the same position as 
described previously. 
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In the 52 fragments of amplification, 80 biallelic markers were detected. The localization 
of these biallelic markers are as shown in Table 2. 



Table 2 



Amnlir nil 


BM 


Vf arkpr name 


Loca lizat io n 
in PG-3 gene 


Polymorph 
ism 


BM position in 
SEQID 


Position of 
amino acid in 
SEQID No:3 


alll 


all2 


No:l 


No:2 


5-390 


Al 


5-390-177 


5' regulatory 


G 


C 


1999 






5-391 


A2 


5-391-43 


Intron A-B 


A 


G 


4601 






5-392 


A3 


5-392-222 


ExonC 


G 


T 


10228 


285 


76 = V 


5-392 


A4 


5-392-280 


Intron C-D 


G 


T 


10286 






5-392 


A5 


5-392-364 


Intron C-D 


G 


_ 


10370 






4-59 


A6 


4-58-318 


ExonT 


G 


T 


39944 


968 


304 = R or I 


4-58 


A7 


4-58-289 


ExonT 


G 


C 


39973 


997 


314 = HorD 


4-54 


A8 


4-54-199 


Intron T-G 


A 


C 


41385 






4-54 


A9 


4-54-180 


Intron T-G 


A 


C 


41404 






4-51 


A10 


4-51-312 


Intron T-G 


G 


C 


42232 






99-86 


All 


99-86-266 


Intron G-H 


A 


G 


67475 






4-88 


A12 


4-88-107 


Intron G-H 


A 


G 


69521 






5-397 


A13 


5-397-141 


Intron G-H 


G 


T 


72838 






5-398 


A14 


5-398-203 


ExonI . 


A 


C 


76060 


2102 


682 = TorN 


99-12738 


A15 


99-12738-248 


Intron I-J 


A 


C 


81253 






99-109 


A16 


99-109-358 


Intron I-J 


A 


C 


83921 






99-12749 


A17 


99-12749-175 


Intron I-J 


C 


T 


91917 






4-21 


A18 


4-21-154 


Intron J-K 


C 


T 


95349 






4-21 


A19 


4-21-317 


Intron J-K 


G 


T 


95511 






4-23 


A20 


4-23-326 . 


Intron J-K 


A 


G 


96190 






99-12753 


A21 


99-12753-34 


Intron J-K 


A 


T 


97294 






5-364 


A22 


5-364-252 


Intron J-K 


G 


T 


98024 






99-12755 


A23 


99-12755-280 


Intron J-K 


A 


G 


98914 






99-12755 


A24 


99-12755-329 


Intron J-K 


A 


C 


98963 






4-87 


A25 


4-87-212 


Intron J-K 


A 


G 


103593 






99-12757 


A26 


99-12757-318 


Intron J-K 


C 


T 


104398 






99-12758 


A27 


99-12758-102 


Intron J-K 


A 


G 


106373 






99-12758 


A28 


99-12758-136 


Intron J-K 


C 


T 


106407 






4-105 


A29 


4-105-98 


Intron J-K 


A 


G 


108315 






4-105 


A30 


4-105-86 


Intron J-K 


A 


G 


108327 






4^5 


A31 


4-45-49 


Intron J-K 


C 


T 


108472 






4-44 


A32 


4-44-277 


Intron J-K 


C 


T 


109196 






4-86 


A33 


4-86-60 


Intron J-K 


G 


C 


114604 






4-84 


A34 


4-84-334 


Intron J-K 


A 


G 


115716 






99-78 


A35 


99-78-321 


Intron J-K 


A 


T 


122083 






99-12767 


A36 


99-12767-36 


Intr nJ-K 


G 


C 


123124 






99-12767 


A37 


99-12767-143 


Intron J-K 


C 


T 


123231 
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Table 3 



BM 


Marker name 


P siti n range of probes 
in SEQ ID No 1 


Probes 


Al 


5*390-177 


1987 


OAl 1 

zUl 1 


PI 


A2 


5-391-43 


/icon 
4589 




P2 


A3 


5-392-222 


1 A*> 1 £ 

10216 


i no /in 
1UZ4U 


P3 


A4 


5-392-280 


10274 


1Uz9o 


P4 


A6 


4-58-318 


39932 


39956 


P6 


A7 


4-58-289 


39961 


39985 


P7 


AS 

/AO 


4-54-199 


41373 


41397 


P8 


A9 


4-54-180 


41392 


41416 


P9 


AlO 

nlv 


4-51-312 
*f— ^ i «j i A> 


42220 


42244 


P10 


Al 1 

Al 1 


99-86-266 


67463 


67487 


Pll 


A19 


4-88-10,7 


69509 


69533 


PI 2 


A13 


5-397-141 


72826 


72850 


PI 3 


Aid 


5-398-903 


76048 


76072 


P14 


A15 


99-19718-248 


81241 


81265 


PI 5 

XT 1^7 


AlO 


99-109-358 


83909 


83933 


PI 6 


A17 


Q9-1 97AQ-1 75 
77-iz/*t2f-i /»? 


91905 


91929 


P17 


A 1ft 


4-91-1 54 


95337 


95361 


PI 8 

x i o 


AlO 


4-91-117 
*t-z 101/ 


95499 


95523 


PI 9 


AZU 


4-93.39£ 


96178 


96202 


P20 


A91 
AZ1 


99-19753-34 


97282 


97306 


P21 

X Al 


A99 
A^Z 


5-364-252 


98012 


98036 


P22 

X it it 


A23 


99-12755-280 


A A AAA 

98902 


98926 


P23 


A 24 


99-12755-399 


98951 


98975 


P24 


A25 


4-87-9 1 9 


103581 


103605 


P25 


A26 


99-19757-318 


104386 


104410 


P26 


A 27 

AA / 


99-12758-109 


106361 


106385 


P27 

X *» # 


A 28 


99-12758-136 


106395 


106419 


P28 


A29 


4-105-98 


1 AOOA1 

108303 


108327 


P29 


A30 

/WW 


4-105-86 


1U8315 


105339 


P30 


A31 


4-45-49 


1 f\QA£Lf\ 

10o4oU 


1 AO/10/1 


P31 


A32 


4-44-277 


1 AOl OA 


1 AOl AO 

1 UyZUo 


P32 


A33 


4-86-60 


1 1A^07 


1 1 A&\ A 


P33 


A34 


4-84-334 




1 1 J /ZO 


P34 


A35 


99-78-321 


lzzU/l 


lzzuyj 


P35 


A36 


99-12767-36 


1 911 19 
IZJ 1 1Z 


1 911 1A 


P36 


A37 


99-12767-143 


123219 


123243 


P37 


A38 


99-12767-189 


123265 


123289 


P38 


A39 


99-12767-380 


123456 


123480 


P39 


A40 


4-80-328 


126726 


126750 


P40 


A41 


4-36-384 


128198 


128222 


P41 


A42 


4-36-264 


128318 


128342 


P42 


A43 


4-36-261 


128321 


128345 


P43 


A44 


4-35-333 


128582 


128606 


P44 
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A45 


4-35-240 


128675 


128699 


P45 


A46 


4-35-173 


128742 


128766 


P46 


A47 


4-35-133 


128782 


128806 


P47 


A48 


99-12771-59 


130793 


130817 


P48 


A49 


99-12774-334 


133194 


133218 


P49 


A50 


99-12776-358 


135374 


135398 


P50 


A51 


99-12781-113 


139377 


139401 


P51 


A52 


4-104-298 


157523 


157547 


P52 


AS3 


4-104-254 


157567 


157591 


P53 


A54 


4-104-250 


157571 


157595 


P54 


ASS 


4-104-214 


157607 


157631 


P55 


A56 


99-12818-289 


172968 


172992 


P56 


A57 


99-24807-271 


180610 


180634 


P57 


A58 


99-24807-84 


180797 


180821 


P58 


A59 


99-12831-157 


190322 


190346 


P59 


A60 


99-12831-241 


190406 


190430 


P60 


A61 


99-12832-387 


191385 


191409 


P61 


A62 


99-12836-30 


195116 


195140 


P62 


A63 


99-12844-262 


203834 


203858 


P63 


A64 


4-24-74 


210139 


210163 


P64 


A6S 


4-24-246 


210309 


210333 


P65 


A66 


4-24-314 


210377 


210401 


P66 


A67 


4-27-190 


211156 


211180 


P67 


A68 


5-400-145 


215984 


216008 


P68 


A69 


5.400-149 


215988 


216012 


P69 


A70 


5-400-175 


216014 


216038 


P70 


A71 


5-400-231 


216070 


216094 


P71 


A72 


5-400-367 


216206 


216230 


P72 


A73 


99-12852-110 


216310 


216334 


P73 


A74 


99-12852-325 


216525 


216549 


P74 


A75 


4-37-326 


221637 


221661 


P75 


A76 


4-37-107 


221855 


221879 


P76 


A77 


5-270-92 


225633 


225657 


P77 


A78 


99-12860-47 


229375 


229399 


P78 


A79 


99-12860-57 


229385 


229409 


P79 


A80 


5.402-144 


237543 


237567 


P80 
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EXAMPLE 4 

VALIDATION OF THE POLYMORPHISMS THROUGH MICROSEQUENCING 
The biallelic markers identified in example 3 were further confirmed and their respective 
frequencies were determined through microsequencing. Microsequencing was carried out for each 
5 individual DNA sample described in Example 1. 

Amplification from genomic DNA of individuals was performed by PCR as described 
above for the detection of the biallelic markers with the same set of PCR primers (Table 1). 

The preferred primers used in microsequencing were about 19 nucleotides in length and 
hybridized just upstream of the considered polymorphic base. According to the invention, the 
10 primers used in microsequencing are detailed in Table 4. 

Table 4 



iviarKer name 


U1V1 


Mis 
1 


Position range of 
microsequencing 
primer mis 1 in 
SEQIDNol 


A/fie J 
1VI1S L 


Complementary 
position range of 
microsequencing 
primer mis. 2 in SEQ 
EDNol 


5-390-177 


Al 


Dl 


1980 


1998 


El 


2000 


2018 


5-391-43 


A2 


D2 


4582 


4600 


E2 


4602 


4620 


5-392-222 


A3 


D3 


10209 


10227 


E3 


10229 


10247 


5-392-280 


A4 


D4 


10267 


10285 


E4 


10287 


10305 


4-58-318 


A6 


D6 


39925 


39943 


E6 


39945 


39963 


4-58-289 


A7 


D7 


39954 


39972 


E7 


39974 


39992 


4-54-199 


A8 


D8 


41366 


41384 


E8 


41386 


41404 


4-54-180 


A9 


D9 


41385 


41403 


E9 


41405 


41423 


4-51-312 


AlO 


D10 


42213 


42231 


ElO 


42233 


42251 


99-86-266 


All 


Dll 


67456 


67474 


Ell 


67476 


67494 


4-88-107 


A12 


D12 


69502 


69520 


E12 


69522 


69540 


5-397-141 


A13 


D13 


72819 


72837 


E13 


72839 


72857 


5-398-203 


A14 


D14 


76041 


76059 


E14 


76061 


76079 


99-12738-248 


A15 


D15 


81234 


81252 


E15 


81254 


81272 


99-109-358 


A16 


D16 


83902 


83920 


E16 


83922 


83940 


99-12749-175 


A17 


D17 


91898 


91916 


E17 


91918 


91936 


4-21-154 


A18 


D18 


95330 


95348 


E18 


95350 


95368 


4-21-317 


A19 


D19 


95492 


95510 


E19 


95512 


95530 


4-23-326 


A20 


D20 


96171 


96189 


E20 


96191 


96209 


99-12753-34 


A21 


D21 


97275 


97293 


E21 


97295 


97313 


5-364-252 


A22 


D22 


98005 


98023 


E22 


98025 


98043 


99-12755-280 


A23 


D23 


98895 


98913 


E23 


98915 


98933 


99-12755-329 


A24 


D24 


98944 


98962 


E24 


98964 


98982 


4-87-212 


A25 


D25 


103574 


103592 


E25 


103594 


103612 


99-12757-318 


A26 


D26 


104379 


104397 


E26 


104399 


104417 


99-12758-102 


A27 


D27 


106354 


106372 


E27 


106374 


106392 


99-12758-136 


A28 


D28 


106388 


106406 


E28 


106408 


106426 


4-105-98 


A29 


D29 


108296 


108314 


E29 


108316 


108334 



121 



4-105-86 


A30 


D30 


108308 


108326 


E30 


108328 


108346 


4-45-49 


A31 


D31 


108453 


108471 


E31 


108473 


108491 


4-44-277 


A32 


D32 


109177 


109195 


E32 


109197 


109215 


4-86-60 


A33 


D33 


114585 


1 14603 


E33 


114605 


114623 


4-84-334 


A34 


D34 


115697 


115715 


E34 


115717 


115735 


99-78-321 


A3S 


D35 


122064 


122082 


E35 


122084 


122102 


99-12767-36 


A36 


D36 


123105 


123123 


E36 


123125 


123143 


99-12767-143 


A37 


D37 


123212 


123230 


E37 


123232 


123250 


99-12767-189 


A38 


D38 


123258 


123276 


E38 


123278 


123296 


99-12767-380 


A39 


D39 


123449 


123467 


E39 


123469 


123487 


4-80-328 


A40 


D40 


126719 


126737 


E40 


126739 


126757 


4-36-384 


A41 


D41 


128191 


128209 


E41 


128211 


128229 


4-36-264 


A42 


D42 


128311 


128329 


E42 


128331 


128349 


4-36-261 


A43 


D43 


128314 


128332 


E43 


128334 


128352 


4-35-333 


A44 


D44 


128575 


128593 


E44 


128595 


128613 


4-35-240 


A45 


IMS 


128668 


128686 


E45 


128688 


128706 


4-35-173 


A46 


D46 


128735 


128753 


E46 


128755 


128773 


4-35-133 


A47 


D47 


128775 


128793 


E47 


128795 


128813 


99-12771-59 


A48 


D48 


130786 


130804 


E48 


130806 


130824 


99-12774-334 


A49 


D49 


133187 


133205 


E49 


133207 


133225 


99-12776-358 


A50 


D50 


135367 


135385 


E50 


135387 


135405 


99-12781-113 


AS1 


D51 


139370 


139388 


ESI 


139390 


139408 


4-104-298 


A52 


DS2 


157516 


157534 


ES2 


157536 


157554 


4-104-254 


A53 


D53 


157560 


157578 


E53 


157580 


157598 


4-104-250 


A54 


D54 


157564 


157582 


ES4 


157584 


157602 


4-104-214 


A55 


D55 


157600 


157618 


E55 


157620 


157638 


99-12818-289 


A56 


D56 


172961 


172979 


E56 


172981 


172999 


99-24807-271 


A57 


D57 


180603 


180621 


E57 


180623 


180641 


99-24807-84 


A58 


D58 


180790 


180808 


E58 


180810 


180828 


99-12831-157 


A59 


D59 


190315 


190333 


E59 


190335 


190353 


99-12831-241 


A60 


D60 


190399 


190417 


E60 


190419 


190437 


99-12832-387 


A61 


D61 


191378 


191396 


E61 


191398 


191416 


99-12836-30 


A62 


D62 


195109 


195127 


E62 


195129 


195147 


99-12844-262 


A63 


D63 


203827 


203845 


E63 


203847 


203865 


4-24-74 


A64 


D64 


210132 


210150 


E64 


210152 


210170 


4-24-246 


A65 


D65 


210302 


210320 


E6S 


210322 


210340 


4-24-314 


A66 


D66 


210370 


210388 


E66 


210390 


210408 


4-27-190 


A67 


D67 


211149 


211167 


E67 


211169 


211187 


5-400-145 


A68 


D68 


215977 


215995 


E68 


215997 


216015 


5-400-149 


A69 


D69 


215981 


215999 


E69 


216001 


216019 


5-400-175 


A70 


D70 


216007 


216025 


E70 


216027 


216045 


5-400-231 


A71 


D71 


216063 


216081 


E71 


216083 


216101 


5-400-367 


A72 


D72 


216199 


216217 


E72 


216219 


216237 


99-12852-110 


A73 


D73 


216303 


216321 


E73 


216323 


216341 


99-12852-325 


A74 


D74 


216518 


216536 


E74 


216538 


216556 


4-37-326 


A75 


D75 


221630 


221648 


E75 


221650 


221668 
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4-37-107 


A76 


D76 


221848 


221866 


E76 


221868 


221886 


5-270-92 


A77 


D77 


225626 


225644 


E77 


225646 


225664 


99-12860^7 


A78 


D78 


229368 


229386 


E78 


229388 


229406 


99-12860-57 


A79 


D79 


229378 


229396 


E79 


229398 


229416 


5-402-144 


A80 


D80 


237536 


237554 


E80 


237556 


237574 



Mis 1 and Mis 2 respectively refer to microsequencing primers which hybridized with the 
non-coding strand of the PG-3 gene or with the coding strand of the PG-3 gene. 
The microsequencing reaction was performed as follows : 
5 After purification of the amplification products, the microsequencing reaction mixture was 

prepared by adding, in a 20|il final volume: 10 pmo! microsequencing oligonucleotide, 1 U 
Thermosequenase (Amersham E79000G), 1.25 \il Thermosequenase buffer (260 mM Tris HC1 pH 
9.5, 65 mM MgCl 2 ), and the two appropriate fluorescent ddNTPs (Perkin Elmer, Dye Terminator 
Set 401095) complementary to the nucleotides at the polymorphic site of each biallelic marker 
10 tested, following the manufacturer's recommendations. After 4 minutes at 94°C, 20 PCR cycles of 
1 5 sec at 55°C, 5 sec at 72°C, and 10 sec at 94°C were carried out in a Tetrad PTC-225 
thermocycler (MJ Research). The unincorporated dye terminators were then removed by ethanol 
precipitation. Samples were finally resuspended in formamide-EDTA loading buffer and heated for 
2 min at 95°C before being loaded on a polyacrylamide sequencing gel. The data were collected by 
15 an ABI PRISM 377 DNA sequencer and processed using the GENESCAN software (Perkin Elmer). 
Following gel analysis, data were automatically processed with software that allows the 
determination of the alleles of biallelic markers present in each amplified fragment. 

The software evaluates such factors as whether the intensities of the signals resulting from 
the above microsequencing procedures are weak, normal, or saturated, or whether the signals are 
20 ambiguous. In addition, the software identifies significant peaks (according to shape and height 
criteria). Among the significant peaks, peaks corresponding to the targeted site are identified based 
on their position. When two significant peaks are detected for the same position, each sample is 
categorized classification as homozygous or heterozygous type based on the height ratio. 

EXAMPLE 5 

25 PREPARATION OF ANTIBODY COMPOSITIONS TO THE PG-3 PROTEIN 

Substantially pure protein or polypeptide is isolated from transfected or transformed cells 
containing an expression vector encoding the PG-3 protein or a portion thereof. The concentration of 
protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to 
the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be 
30 prepared as follows: 

A. Monoclonal Antibody Production by Hvbridoma Fusion 
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Monoclonal antibody to epitopes in the PG-3 protein or a portion thereof can be prepared fr m 
murine hybridomas according to the classical method of Kohler, G. and Milstein, C, (1975) or 
derivative methods thereof. Also see Harlow, E., and D. Lane. 1988. 

Briefly, a mouse is repetitively inoculated with a few micrograms of the PG-3 protein or a 
5 portion thereof over a period of a few weeks. The mouse is then sacrificed, and the antibody producing 
cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse 
myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media 
comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the 
dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody- 
1 0 producing clones are identified by detection of antibody in the supernatant fluid of the wells by 
immunoassay procedures, such as ELBA, as originally described by Engvall, (1 980), and derivative 
methods thereof. Selected positive clones can be expanded and their monoclonal antibody product 
harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et 
at. (1986). 

15 B. Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to heterogeneous epitopes in the PG-3 protein 
or a portion thereof can be prepared by immunizing suitable non-human animal with the PG-3 
protein or a portion thereof, which can be unmodified or modified to enhance immunogenicity. A 
suitable non-human animal is preferably a non-human mammal is selected, usually a mouse, rat, 

20 rabbit, goat, or horse. Alternatively, a crude preparation which has been enriched for PG-3 
concentration can be used to generate antibodies. Such proteins, fragments or preparations are 
introduced into the non-human mammal in the presence of an appropriate adjuvant (e.g. aluminum 
hydroxide, RIBI, etc.) which is known in the art. In addition the protein, fragment or preparation 
can be pretreated with an agent which will increase antigenicity, such agents are known in the art 

25 and include, for example, methylated bovine serum albumin (mBSA), bovine serum albumin 
(BSA), Hepatitis B surface antigen, and keyhole limpet hemocyanin (KLH). Serum from the 
immunized animal is collected, treated and tested according to known procedures. If the serum 
contains polyclonal antibodies to undesired epitopes, the polyclonal antibodies can be purified by 
immunoaffinity chromatography. 

30 Effective polyclonal antibody production is affected by many factors related both to the 

antigen and the host species. Also, host animals vary in response to site of inoculations and dose, 
with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng 
level) of antigen administered at multiple intradermal sites appears to be most reliable. Techniques 
for producing and processing polyclonal antisera are known in the art, see for example, Mayer and 

35 Walker (1987). An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al 
(1971). 
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Booster injections can be given at regular intervals, and antiserum harvested when antibody 
titer thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar 
against known concentrations of the antigen, begins to fall. See, for example, Ouchterlony, O. et 
a!., (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum 
5 (about 1 2 uM). Affinity of the antisera for the antigen is determined by preparing competitive 
binding curves, as described, for example, by Fisher, D., (1980). 

Antibody preparations prepared according to either the monoclonal or the polyclonal protocol 
are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances 
in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of 
10 antigen in a biological sample. The antibodies may also be used in therapeutic compositions for killing 
cells expressing the protein or reducing the levels of the protein in the body. 

While the preferred embodiment of the invention has been illustrated and described, it will 
be appreciated that various changes can be made therein by the one skilled in the art without 
1 5 departing from the spirit and scope of the invention. 
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SEQUENCE LISTING FREE TEXT 

The following free text appears in the accompanying Sequence Listing : 
5' regulatory region 
3' regulatory region 
35 polymorphic base 
or 

complement 
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probe 

sequencing oligonucleotide primer 

insertion of 

exon 
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WHAT IS CLAIMED: 

1 . An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 15 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span 
comprises at least one of the following nucleotide positions of SEQ ID No 1 : 1 -9792 1, 98517- 

5 103471, 103603-108222, 108390-109221, 109324-114409, 114538-115723, 115957-122102, 
122225-126876, 127033-157212, 157808-240825. 

2. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 15 nucleotides of SEQ ED No 2 or the complements thereof. 

10 

3. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
contiguous span of at least 1 5 nucleotides of anyone of SEQ ID Nos 1 and 2 or the complement 
thereof, wherein said span includes a PG-3-reIated biallelic marker in said sequence. 

15 4. A polynucleotide according to claim 3, wherein said PG-3-related biallelic marker is 

selected from the group consisting of A 1 to A80, and the complements thereof. 

5. A polynucleotide according to claim 3, wherein said PG-3-related biallelic marker is 
selected from the group consisting of Al to A5 and A8 to A80, and the complements thereof. 

20 

6. A polynucleotide according to claim 3, wherein said PG-3-related biallelic marker is 
selected from the group consisting of A6 and A7, and the complements thereof. 

7. A polynucleotide according to claim 3, wherein said contiguous span is 18 to 35 
25 nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said 

polynucleotide. 

8. A polynucleotide according to claim 7, wherein said polynucleotide consists of said 
contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker is at 

30 the center of said polynucleotide. 

9. A polynucleotide according to claim 7, wherein said polynucleotide consists essentially 
of a sequence selected from the following sequences: PI to P4 and P6 to P80, and the 
complementary sequences thereto. 

35 

10. A polynucleotide according to any one of claims 1, 2 or 3, wherein the 3* end of said 
contiguous span is present at the 3' end of said polynucleotide. 
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1 1. A polynucleotide according to claim 3, wherein the 3* end of said contiguous span is 
located at the 3' end of said polynucleotide and said biallelic marker is present at the 3' end of said 
polynucleotide. 

5 

12. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
contiguous span of at least 15 nucleotides of anyone of SEQ ID No 1,2 or the complements thereof, 
wherein the 3' end of said contiguous span is located at the 3' end of said polynucleotide, and 
wherein the 3' end of said polynucleotide is located within 20 nucleotides upstream of a PG-3- 

1 0 related biallelic marker in said sequence. 

13. A polynucleotide according to claim 12, wherein the 3' end of said polynucleotide is 
located one nucleotide upstream of said PG-3-related biallelic marker in said sequence. 

15 14. A polynucleotide according to claim 13, wherein said polynucleotide consists 

essentially of a sequence selected from the following sequences: Dl to D4, D6 to D80, El to E4, 
and E6 to E80. 

15. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
20 sequence selected from the following sequences: Bl to B52 and CI to C52. 

1 6. An isolated, purified, or recombinant polynucleotide which encodes a polypeptide 
comprising a contiguous span of at least 6 amino acids of SEQ ID No 3. 

25 17. A polynucleotide according to any one of claims 1-16 attached to a solid support. 

1 8. An array of polynucleotides comprising at least one polynucleotide according to claim 

17. 

30 19. An array according to claim 18, wherein said array is addressable. 

20. A polynucleotide according to any one of claims 1-16 further comprising a label. 

21. A recombinant vector comprising a polynucleotide according to any one of claims 1- 

35 16. 

22. A host cell comprising a recombinant vector according to claim 21 . 
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23. A non-human host animal or mammal comprising a recombinant vector according to 
claim 21. 

5 24. A mammalian host cell comprising a PG-3 gene disrupted by homologous 

recombination with a knock out vector, comprising a polynucleotide according to any one of claims 
1-16. 

25. A non-human host mammal comprising a PG-3 gene disrupted by homologous 

10 recombination with a knock out vector, comprising a polynucleotide according to any one of claims 
1-16. 

26. A method of genotyping comprising determining the identity of a nucleotide at a PG-3- 
related biallelic marker or the complement thereof in a biological sample. 

15 

27. A method according to claim 26, wherein said biological sample is derived from a 
single subject. 

28. A method according to claim 27, wherein the identity of the nucleotides at said biallelic 
20 marker is determined for both copies of said biallelic marker present in said individual's genome. 

29. A method according to claim 26, wherein said biological sample is derived from 
multiple subjects. 

25 30. A method according to claim 26, further comprising amplifying a portion of said 

sequence comprising the biallelic marker prior to said determining step. 

31. A method according to claim 30, wherein said amplifying is performed by PCR. 

30 32. A method according to claim 26, wherein said determining is performed by a 

hybridization assay. 

33. A method according to claim 26, wherein said determining is performed by a 
sequencing assay. 

35 

34. A method according to claim 26, wherein said determining is performed by a 
microsequencing assay. 
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35. A method according to claim 26, wherein said determining is performed by an enzyme- 
based mismatch detection assay. 

5 36. A method of estimating the frequency of an allele of a PG-3-related biallelic marker in 

a population comprising: 

a) genotyping individuals from said population for said biallelic marker according to the 
method of claim 26; and 

b) determining the proportional representation of said biallelic marker in said population. 

10 

37. A method of detecting an association between a genotype and a trait, comprising the 
steps of: 

a) determining the frequency of at least one PG-3-related biallelic marker in trait positive 
population according to the method of claim 36; 
1 5 b) determining the frequency of at least one PG-3-related biallelic marker in a control 

population according to the method of claim 36; and 

c) determining whether a statistically significant association exists between said genotype 
and said trait. 

20 38. A method of estimating the frequency of a haplotype for a set of biallelic markers in a 

population, comprising: 

a) genotyping at least one PG-3-related biallelic marker according to claim 27 for each 
individual in said population; 

b) genotyping a second biallelic marker by determining the identity of the nucleotides at 

25 said second biallelic marker for both copies of said second biallelic marker present in the genome of 
each individual in said population; and 

c) applying a haplotype determination method to the identities of the nucleotides 
determined in steps a) and b) to obtain an estimate of said frequency. 

30 39. A method according to claim 38, wherein said haplotype determination method is 

selected from the group consisting of asymmetric PCR amplification, double PCR amplification of 
specific alleles, the Clark algorithm, or an expectation-maximization algorithm. 

40. A method of detecting an association between a haplotype and a trait, comprising the 
35 steps of: 

a) estimating the frequency of at least one haplotype in a trait positive population according 
to the method of claim 38; 
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b) estimating the frequency of said haplotype in a control population according to the 
method of claim 38; and 

c) determining whether a statistically significant association exists between said haplotype 
and said trait. 

5 

41. A method according to claim 37, wherein said genotyping steps a) and b) are 
performed on a single pooled biological sample derived from each of said populations. 

42. A method according to claim 37, wherein said genotyping steps a) and b) performed 
10 separately on biological samples derived from each individual in said populations. 

43. A method according to either claim 37 or 40, wherein said trait is cancer susceptibility. 

44. A method according to either claim 37 or 40, wherein said control population is a trait 
1 5 negative population. 

45. A method according to either claim 37 or 40, wherein said case control population is a 
random population. 

20 46. Use of a polynucleotide comprising a contiguous span of at least 15 nucleotides of a 

sequence selected from the group consisting of the SEQ ID Nos 1,2, amplicons 5-390, 5-391, 5- 
392, 4-59, 4-58, 4-54, 4-51, 99-86, 4-88, 5-397, 5-398, 99-12738, 99-109, 99-12749, 4-21, 4-23, 99- 
12753, 5-364, 99-12755, 4-87, 99-12757, 99-12758, 4-105, 4-45, 4-44, 4-86, 4-84, 99-78, 99- 
12767, 4-80, 4-36, 4-35, 99-12771, 99-12774, 99-12776, 99-12781, 4-104, 99-12818, 99-24807, 99- 

25 12827, 99-12831, 99-12832, 99-12836, 99-12844, 4-24, 4-27, 5-400, 99-12852, 4-37, 5-270, 99- 
12860, and 5-402 or the complementary sequence thereto for determining the identity of the 
nucleotide at a PG-3-related biallelic marker 

30 47. Use according to claim 46 in a microsequencing assay, wherein the 3* end of said 

contiguous span is located at the 3' end of said polynucleotide and wherein the 3' end of said 
polynucleotide is located 1 nucleotide upstream of said PG-3-related biallelic marker in said 
sequence. 

35 48. Use according to claim 46 in a hybridization assay, wherein said c ntiguous span 

includes said PG-3-related biallelic marker. 
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49. Use according to claim 46 in a specific amplification assay, wherein the 3' end of said 
contiguous span is located at the 3* end of said polynucleotide and said biallelic marker is present at 
the 3' end of said polynucleotide. 

5 50. Use according to claim 46 in a sequencing assay, wherein the 3* end of said contiguous 

span is located at the 3' end of said polynucleotide. 

51 . Use according to any one of claims 46-50, wherein said PG-3-related biallelic is a 
biallelic marker selected from the group consisting of Al to A80. 

10 

52. An isolated, purified, or recombinant polypeptide comprising a contiguous span of at 
least 6 amino acids of SEQ ID No 3. 

53. An isolated or purified antibody composition capable of selectively binding to an 
1 5 epitope-containing fragment of a polypeptide according to claim 52. 

54. A method according to any one of claims 26-45, wherein said PG-3-related biallelic 
marker is selected from the group consisting of Al to A80 and the complements thereof. 

20 55. A diagnostic kit comprising a polynucleotide according to any one of claims 3-15. 

56. A computer readable medium having stored thereon a sequence selected from the group 
consisting of a nucleic acid code comprising one of the following: 

a) a contiguous span of at least 15 nucleotides of SEQ ID No 1, wherein said contiguous 
25 span comprises at least one of the following nucleotide positions of SEQ ID No 1 : 1 -9792 1 , 985 1 7- 

103471, 103603-108222, 108390-109221, 109324-114409, 114538-115723, 115957-122102, 
122225-126876, 127033-157212, 157808-240825; 

b) a contiguous span of at least 15 nucleotides of SEQ ID No 2 or the complements thereof; 

and 

30 c) a nucleotide sequence complementary to any one of the preceding nucleotide sequences. 

57. A computer readable medium having stored thereon a sequence consisting of a 
polypeptide code comprising a contiguous span of at least 6 amino acids of SEQ ID No 3. 

35 58. A computer system comprising a processor and a data storage device wherein said data 

storage device is a computer readable medium according to claim 56 or 57. 
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59. A computer system according to claim 58, further comprising a sequence comparer and 
a data storage device having reference sequences stored thereon. 

60. A computer system of Claim 59 wherein said sequence comparer comprises a computer 
5 program which indicates polymorphisms. 

61 . A computer system of Claim 58 further comprising an identifier which identifies 
features in said sequence. 

10 62. A method for comparing a first sequence to a reference sequence, comprising the steps 

of: 

reading said first sequence and said reference sequence through use of a computer program 
which compares sequences; and 

determining differences between said first sequence and said reference sequence with said 
1 5 computer program, 

wherein said first sequence is selected from the group consisting of a nucleic acid code 
comprising one of the following: 

a) a contiguous span of at least 15 nucleotides of SEQ ID No 1, wherein said contiguous 
span comprises at least one of the following nucleotide positions of SEQ ID No 1 : 1-97921 , 985 1 7- 

20 103471, 103603-108222, 108390-109221, 109324-114409, 114538-115723, 115957-122102, 
122225-126876, 127033-157212, 157808-240825; 

b) a contiguous span of at least 15 nucleotides of SEQ ID No 2 or the complements thereof; 

and 

c) a nucleotide sequence complementary to any one of the preceding nucleotide sequences; 

25 and, 

d) a polypeptide code comprising a contiguous span of at least 6 amino acids of SEQ ID No 

3. 
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AMENDED CLAIMS 
[received by the International Bureau on 25 January 2001 (25.01.01); 
original claims 1,2 and 56 amended; remaining claims unchanged (2 pages)] 

1. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 200 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span 
comprises at least one of the following nucleotide positions of SEQ ID No 1 : 1 -9792 1 , 985 1 7- 

5 103471, 103603-108222, 108390-109221, 109324-114409, 114538-115723, 115957-122102, 
122225-126876, 127033-157212, 157808-240825. 

2. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 200 nucleotides of SEQ ID No 2 or the complements thereof. 

10 

3. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
contiguous span of at least 15 nucleotides of anyone of SEQ ID Nos 1 and 2 or the complement 
thereof, wherein said span includes a PG-3-related biallelic marker in said sequence. 

15 4. A polynucleotide according to claim 3, wherein said PG-3-related biallelic marker is 

selected from the group consisting of A 1 to A80, and the complements thereof. 

5. A polynucleotide according to claim 3, wherein said PG-3-related biallelic marker is 
selected from the group consisting of A 1 to A5 and A8 to A80, and the complements thereof. 

20 

6. A polynucleotide according to claim 3, wherein said PG-3-related biallelic marker is 
selected from the group consisting of A6 and A7, and the complements thereof. 

7. A polynucleotide according to claim 3, wherein said contiguous span is 18 to 35 
25 nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said 

polynucleotide. 

8. A polynucleotide according to claim 7, wherein said polynucleotide consists of said 
contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker is at 

30 the center of said polynucleotide. 

9. A polynucleotide according to claim 7, wherein said polynucleotide consists essentially 
of a sequence selected from the following sequences: PI to P4 and P6 to P80, and the 
complementary sequences thereto. 

35 

10. A polynucleotide according to any one of claims 1, 2 or 3, wherein the 3 1 end of said 
contiguous span is present at the 3* end of said polynucleotide. 
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49. Use according to claim 46 in a specific amplification assay, wherein the 3* end of said 
contiguous span is located at the 3' end of said polynucleotide and said biallelic marker is present at 
the 3' end of said polynucleotide. 

5 50. Use according to claim 46 in a sequencing assay, wherein the 3* end of said contiguous 

span is located at the 3* end of said polynucleotide. 

5 1 . Use according to any one of claims 46-50, wherein said PG-3-related biallelic is a 
biallelic marker selected from the group consisting of A 1 to A80. 

10 

52. An isolated, purified, or recombinant polypeptide comprising a contiguous span of at 
least 6 amino acids of SEQ ID No 3. 

53. An isolated or purified antibody composition capable of selectively binding to an 
15 epitope-containing fragment of a polypeptide according to claim 52. 

54. A method according to any one of claims 26-45, wherein said PG-3-related biallelic 
marker is selected from the group consisting of Al to A80 and the complements thereof. 

20 55. A diagnostic kit comprising a polynucleotide according to any one of claims 3-15. 

56. A computer readable medium having stored thereon at least 2 nucleic acid code 
sequences comprising any one of the following: 

a) a contiguous span of at least 200 nucleotides of SEQ ID No 1, wherein said contiguous 
25 span comprises at least one of the following nucleotide positions of SEQ ID No 1 : 1-97921, 985 17- 

103471, 103603-108222, 108390-109221, 109324-114409, 114538-115723, 115957-122102, 
122225-126876, 127033-157212, 157808-240825; 

b) a contiguous span of at least 200 nucleotides of SEQ ID No 2 or the complements 
thereof; and 

30 c) a nucleotide sequence complementary to any one of the preceding nucleotide sequences. 

57. A computer readable medium having stored thereon a sequence consisting of a 
polypeptide code comprising a contiguous span of at least 6 amino acids of SEQ ID No 3. 

35 58. A computer system comprising a processor and a data storage device wherein said data 

storage device is a computer readable medium according to claim 56 or 57. 
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<221> primerjbind 

<222> 2108. .2125 

<223> 5-390. rp complement 

<220> 

<22l> primer_ bind 
<222> 4559. .4577 
<223> 5-391. pu 

<220> 

<221> primer_bind 

<222> 4891. .4908 

<223> 5-391. rp complement 

<220> 

<221> primerjbind 
<222> 10007. .10025 
<223> 5-392. pu 

<220> 

<221> primerjbind 

<222> 10411. Tl0430 

<223> 5-392.rp complement 

<220> 

<221> primer_bind 
<222> 39556. .39574 
<223> 4-59. rp 

<220> 

<221> primer_bind 
<222> 39877. .39896 
<223> 4-58. rp 

<220> 

<221> primerjbind 
<222> 39953. .39970 
<223> 4-59. pu complement 

<220> 

<221> primerjbind 
<222> 40242. .40259 
<223> 4-58. pu complement 

<220> 

<221> primer_bind 
<222> 41137. .41154 
<223> 4-54. rp 

<220> 

<221> primerjbind 

<222> 41564 . .41581 

<223> 4-54. pu complement 

<220> 

<221> primer_ bind 
<222> 42122. .42141 
<223> 4-51. rp 

<220> 

<221> primerjbind 

<222> 42526. .42543 

<22 3> 4-51. pu complement 
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<110> Genset 

<120> PG-3 and biallelic markers thereof 

<130> 68.W01 

<140> US 60/149,941 
<141> 1999-08-19 

<160> 5 

<170> Patent. pm 

<210> 1 

<211> 240825 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<222> 1. .2000 

<223> 5 1 regulatory region 

<220> 

<221> exon 
<222> 2001. .2079 
<223> exon A 

<220> 

<221> exon 
<222> 4627. .4718 
<22 3> exon B 

<220> 

<221> exon 

<222> 10115. .10233 

<223> exon C 

<220> 

<221> exon 

<222> 26810. .26897 

<223> exon D 

<220> 

<221> exon 

<222> 31357. .31471 

<223> exon E 

<220> 

<221> exon 

<222> 34261. .34404 

<223> exon F 

<220> 

<22l> exon 

<222> 37377. .37466 

<223> exon S 

<220> 

<221> exon 

<222> 39704 . .40858 

<223> exon T 



WO 01/14550 

2 

<220> 

<221> exon 

<222> 50436. .50545 

<223> exon G 



<220> 

<22l> exon 

<222> 72881. .72918 

<223> exon H 



<220> 

<221> exon 

<222> 75989. .76151 

<223> exon I 



<220> 

<221> exon 

<222> 95111. .95188 

<223> exon J 



<220> 

<221> exon 

<222> 216015. .216252 
<22 3> exon K 



<220> 

<221> exon 

<222> 237526. .238825 
<223> exon L 



<220> 

<221> misc_f eature 
<222> 238826. .240825 
<223> 3* regulatory region 



<220> 

<221> allele 
<222> 1999 

<223> 5-390-177 : polymorphic base G or C 
<220> 

<221> allele 
<222> 4601 

<223> 5-391-43 : polymorphic base A or G 



<220> 

<221> allele 
<222> 10228 

<223> 5-392-222 : polymorphic base G or T 
<220> 

<221> allele 
<222> 10286 

<223> 5-392-280 : polymorphic base G or T 
<220> 

<221> allele 
<222> 10370 

<223> 5-392-364 : insertion of G 



<220> 

<221> allele 
<222> 39944 
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<223> 4-58-318 : polymorphic base G or T 
<220> 

<22l> allele 
<222> 39973 

<223> 4-58-289 : polymorphic base G or C 
<220> 

<221> allele 
<222> 41385 

<223> 4-54-199 : polymorphic base A or C 
<220> 

<221> allele 
<222> 41404 

<223> 4-54-180 : polymorphic base A or C 
<220> 

<221> allele 
<222> 42232 

<223> 4-51-312 : polymorphic base G or C 
<220> 

<221> allele 
<222> 67475 

<223> 99-86-266 : polymorphic base A or G 
<220> 

<221> allele 
<222> 69521 

<22 3> 4-88-107 : polymorphic base A or G 
<220> 

<221> allele 
<222> 72838 

<223> 5-397-141 : polymorphic base G or T 
<220> 

<221> allele 
<222> 76060 

<223> 5-398-203 : polymorphic base A or C 
<220> 

<221> allele 
<222> 81253 

<223> 99-12738-248 : polymorphic base A or C 
<220> 

<221> allele 
<222> 83921 

<223> 99-109-358 : polymorphic base A or C 
<220> 

<221> allele 
<222> 91917 

<223> 99-12749-175 : polymorphic base C or T 
<220> 

<221> allele 
<222> 95349 

<223> 4-21-154 : polymorphic base C or T 



<220> 
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<221> allele 
<222> 95511 

<223> 4-21-317 : polymorphic base G or T 
<220> 

<221> allele 
<222> 96190 

<223> 4-23-326 : polymorphic base A or G 
<220> 

<22l> allele 
<222> 97294 

<223> 99-12753-34 : polymorphic base A or T 
<220> 

<221> allele 
<222> 98024 

<223> 5-364-252 : polymorphic base G or T 
<220> 

<22l> allele 
<222> 98914 

<223> 99-12755-280 : polymorphic base A or G 
<220> 

<221> allele 
<222> 98963 

<223> 99-12755-329 : polymorphic base A or C 
<220> 

<221> allele 
<222> 103593 

<223> 4-87-212 : polymorphic base A or G 
<220> 

<221> allele 
<222> 104398 

<223> 99-12757-318 : polymorphic base C or T 
<220> 

<22l> allele 
<222> 106373 

<223> 99-12758-102 : polymorphic base A or G 
<220> 

<221> allele 
<222> 106407 

<223> 99-12758-136 : polymorphic base C or T 
<220> 

<221> allele 
<222> 108315 

<223> 4-105-98 : polymorphic base A or G 
<220> 

<221> allele 
<222> 108327 

<223> 4-105-86 : polymorphic base A or G 
<220> 

<221> allele 
<222> 108472 

<223> 4-45-49 : polymorphic base C or T 
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<220> 

<221> allele 
<222> 109196 
<223> 4-44-277 



: polymorphic base C or T 



<220> 

<221> allele 
<222> 114604 
<223> 4-86-60 



polymorphic base G or C 



<220> 

<221> allele 
<222> 115716 
<223> 4-84-334 



polymorphic base A or G 



<220> 

<22l> allele 
<222> 122083 
<223> 99-78-321 



polymorphic base A or T 



<220> 

<22l> allele 
<222> 123124 
<223> 99-12767-36 



polymorphic base G or C 



<220> 

<221> allele 
<222> 123231 
<223> 99-12767- 



143 



polymorphic base C or T 



<220> 

<221> allele 

<222> 123277 

<223> 99-12767-189 



polymorphic base C or T 



<220> 

<221> allele 
<222> 123468 
<223> 99-12767-380 



polymorphic base A or G 



<220> 

<22l> allele 
<222> 126738 
<223> 4-80-328 



polymorphic base C or T 



<220> 

<221> allele 
<222> 128210 
<223> 4-36-384 



polymorphic base G or C 



<220> 

<221> allele 
<222> 128330 
<223> 4-36-264 



polymorphic base A or G 



<220> 

<221> allele 
<222> 128333 
<223> 4-36-261 



polymorphic base A or C 



<220> 

<221> allele 
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<222> 128594 

<223> 4-35-333 : polymorphic base A or C 
<220> 

<22l> allele 
<222> 128687 

<223> 4-35-240 : polymorphic base G or C 
<220> 

<221> allele 
<222> 128754 

<223> 4-35-173 : polymorphic base A or T 
<220> 

<22l> allele 
<222> 128794 

<223> 4-35-133 : polymorphic base C or T 
<220> 

<221> allele 
<222> 130805 

<223> 99-12771-59 : polymorphic base G or T 
<220> 

<221> allele 
<222> 133206 

<223> 99-12774-334 : polymorphic base A or C 
<220> 

<22l> allele 
<222> 135386 

<223> 99-12776-358 : polymorphic base A or G 
<220> 

<221> allele 
<222> 139389 

<223> 99-12781-113 : polymorphic base A or G 
<220> 

<221> allele 
<222> 157535 

<223> 4-104-298 : polymorphic base G or C 
<220> 

<22l> allele 
<222> 157579 

<223> 4-104-254 : polymorphic base A or G 
<220> 

<22l> allele 
<222> 157583 

<223> 4-104-250 : polymorphic base C or T 
<220> 

<221> allele 
<222> 157619 

<223> 4-104-214 : polymorphic base A or G 
<220> 

<221> allele 
<222> 172980 

<223> 99-12818-289 : polymorphic base C or T 
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<220> 

<221> allele 
<222> 180622 

<223> 99-24807-271 : polymorphic base C or T 
<220> 

<221> allele 
<222> 180809 

<223> 99-24807-84 : polymorphic base A or G 
<220> 

<221> allele 
<222> 190334 

<223> 99-12831-157 : polymorphic base A or G 
<220> 

<221> allele 
<222> 190418 

<223> 99-12831-241 : polymorphic base C or T 
<220> 

<221> allele 
<222> 191397 

<223> 99-12832-387 : polymorphic base C or T 
<220> 

<221> allele 
<222> 195128 

<223> 99-12836-30 : polymorphic base G or C 
<220> 

<221> allele 
<222> 203846 

<223> 99-12844-262 : polymorphic base G or C 
<220> 

<221> allele 
<222> 210151 

<223> 4-24-74 : polymorphic base C or T 
<220> 

<221> allele 
<222> 210321 

<223> 4-24-246 : polymorphic base C or T 
<220> 

<221> allele 
<222> 210389 

<223> 4-24-314 : polymorphic base G or C 
<220> 

<221> allele 
<222> 211168 

<223> 4-27-190 : polymorphic base A or G 
<220> 

<221> allele 
<222> 215996 

<223> 5-400-145 : polymorphic base A or G 
<220> 

<221> allele 
<222> 216000 
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<223> 5-400-149 : polymorphic base G or C 
<220> 

<221> allele 
<222> 216026 

<223> 5-400-175 : polymorphic base C or T 
<220> 

<221> allele 
<222> 216082 

<223> 5-400-231 : polymorphic base C or T 
<220> 

<221> allele 
<222> 216218 

<223> 5-400-367 : polymorphic base A or C 
<220> 

<22l> allele 
<222> 216322 

<223> 99-12852-110 : polymorphic base G or T 
<220> 

<221> allele 
<222> 216537 

<223> 99-12852-325 : polymorphic base A or G 
<220> 

<221> allele 
<222> 221649 

<223> 4-37-326 : polymorphic base A or C 
<220> 

<221> allele 
<222> 221867 

<223> 4-37-107 : polymorphic base A or G 
<220> 

<221> allele 
<222> 225645 

<223> 5-270-92 : polymorphic base G or C 
<220> 

<221> allele 
<222> 229387 

<223> 99-12860-47 : polymorphic base A or G 
<220> 

<221> allele 
<222> 229397 

<223> 99-12860-57 : polymorphic base A or T 
<220> 

<221> allele 
<222> 237555 

<223> 5-402-144 : polymorphic base C or T 
<220> 

<221> primer_bind 
<222> 1823. .1840 
<223> 5-390. pu 



<220> 
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<220> 

<221> primer_bind 
<222> 67289. .67309 
<223> 99-86. rp 

<220> 

<221> primer_ bind 

<222> 67724 . .67741 

<223> 99-86. pu complement 

<220> 

<221> primer_bind 
<222> 69182 . .69200 
<223> 4-88. rp 

<220> 

<221> primer_bind 
<222> 69609. .69626 
<223> 4-88. pu complement 

<220> 

<221> primer_bind 
<222> 72698. .72715 
<223> 5-397. pu 

<220> 

<221> primerjbind 

<222> 73099. .73117 

<223> 5-397. rp complement 

<220> 

<221> primer_bind 
<222> 75858. .75877 
<223> 5-398. pu 

<220> 

<221> primer_bind 

<222> 76289. .76306 

<223> 5-398. rp complement 

<220> 

<221> primer_bind 
<222> 81006. .81025 
<223> 99-12738. pu 

<220> 

<221> primer_bind 

<222> 81466. .81485 

<223> 99-12738. rp complement 

<220> 

<221> prime rjbind 
<222> 83564. .83582 
<223> 99-109. pu 

<220> 

<221> prime r_bind 

<222> 83990. .84007 

<223> 99-109.rp complement 

<220> 

<221> prime r_bind 



WO 01/14550 



PCMB00/01098 



11 

<222> 91743. .91763 
<223> 99-12749. pu 

<220> 

<221> primer_bind 

<222> 92123. .92142 

<223> 99-12749. rp complement 

<220> 

<221> primer_bind 
<222> 95196. .95214 
<223> 4-21. pu 

<220> 

<221> primerjbind 

<222> 95600. .95619 

<223> 4-21. rp complement 

<220> 

<221> primer_bind 
<222> 95865. .95882 
<223> 4-23. pu 

<220> 

<221> primerjbind 

<222> 96210. .96229 

<223> 4-23.rp complement 

<220> 

<221> primerjbind 
<222> 97261. .97278 
<223> 99-12753. pu 

<220> 

<221> primerjbind 

<222> 97728. .97747 

<223> 99-12753. rp complement 

<220> 

<221> primer_ bind 
<222> 97831. .97849 
<223> 5-364. rp 

<220> 

<221> primerjbind 

<222> 98256. .98275 

<223> 5-364. pu complement 

<220> 

<22l> primer_bind 
<222> 98638. .98656 
<223> 99-12755. pu 

<220> 

<221> primerjbind 

<222> 99111. .99131 

<223> 99-12755. rp complement 

<220> 

<221> primerjbind 
<222> 103376. .103395 
<223> 4-87. rp 
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<220> 

<221> primer_bind 
<222> 103801. .103818 
<223> 4-87.pu complement 

<220> 

<221> primer_bind 
<222> 104081. .104100 
<223> 99-12757. pu 

<220> 

<221> primer_bind 

<222> 104619. .104636 

<223> 99-12757. rp complement 

<220> 

<22l> primer_ bind 
<222> 106272. .106291 
<223> 99-12758. pu 

<220> 

<221> prime r_bind 

<222> 106780. .106799 

<223> 99-12758. rp complement 

<220> 

<221> primer_bind 
<222> 108200. .108218 
<223> 4-105. rp 

<220> 

<221> prime r_bind 
<222> 108223. .108246 
<223> 4-45. rp 

<220> 

<221> primer_bind 
<222> 108390. .108412 
<223> 4-105. pu complement 

<220> 

<22l> primer_bind 
<222> 108499. .108520 
<223> 4-45.pu complement 

<220> 

<221> primerjbind 
<222> 109123. .109142 
<223> 4-44. rp 

<220> 

<221> primer_bind 
<222> 109454. .109471 
<223> 4-44.pu complement 

<220> 

<221> primer_bind 
<222> 114217. .114234 
<223> 4-86. rp 

<220> 

<221> primerjbind 
<222> 114646. .114663 
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<223> 4-86.pu complement 
<220> 

<221> primer Jbind 
<222> 115630. .115647 
<223> 4-84. rp 

<220> 

<221> prime r_bind 
<222> 116031. .116049 
<223> 4-84.pu complement 

<220> 

<221> primer_bind 
<222> 121991. .122011 
<223> 99-78. rp 

<220> 

<221> prime r_bind 
<222> 122384. .122401 
<223> 99-78. pu complement 

<220> 

<221> primerjbind 
<222> 123089. .123106 
<223> 99-12767. pu 

<220> 

<221> primer_bind 

<222> 123565. .123583 

<223> 99-12767. rp complement 

<220> 

<221> primer_ bind 
<222> 126711. .126729 
<223> 4-80. rp 

<220> 

<221> primer_bind 
<222> 127048. .127065 
<223> 4-80. pu complement 

<220> 

<22l> primer_bind 
<222> 128162. .128179 
<223> 4-36. rp 

<220> 

<221> prime r_bind 

<222> 128480. .128497 

<223> 4-35. rp 

<220> 

<221> prime r_bind 

<222> 128573. .128590 

<223> 4-36. pu complement 

<220> 

<221> primerjbind 

<222> 128909. .128926 

<223> 4-35. pu complement 



<220> 
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<221> primer_bind 
<222> 130747. .130764 
<223> 99-12771. pu 

<220> 

<22l> prime r_bind 

<222> 131254. .131273 

<223> 99-12771. rp complement 

<220> 

<221> primer_bind 
<222> 132873. .132892 
<223> 99-12774. pu 

<220> 

<221> prime r_bind 

<222> 133305. .133325 

<223> 99-12774. rp complement 

<220> 

<221> primer_bind 
<222> 135029. .135048 
<223> 99-12776. pu 

<220> 

<221> primer_bind 

<222> 135458. .135478 

<223> 99-12776. rp complement 

<220> 

<221> prime r_bind 
<222> 139277. .139296 
<223> 99-12781. pu 

<220> 

<221> primer — bind 

<222> 139724. .139742 

<223> 99-12781. rp complement 

<220> 

<22l> primer_bind 
<222> 157181. .157199 
<223> 4-104. rp 

<220> 

<221> prime r_Jt>ind 
<222> 157814. .157832 
<223> 4-104. pu complement 

<220> 

<221> primer_bind 
<222> 172692. .172709 
<223> 99-12818. pu 

<220> 

<221> primerjbind 

<222> 173072. .173091 

<223> 99-12818. rp complement 

<220> 

<221> primer_bind 
<222> 180248 .. 180268 
<223> 99-24807. rp 
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<220> 

<221> prime r_bind 

<222> 180874. .180892 

<223> 99-24807. pu complement 

<220> 

<221> primer_bind 
<222> 184662. .184680 
<223> 99-12827. pu 

<220> 

<221> primer_bind 

<222> 185138. .185156 

<223> 99-12827. rp complement 

<220> 

<22l> primerjbind 
<222> 190178. .190196 
<223> 99-12831. pu 

<220> 

<221> prime r__bind 

<222> 190643. .190663 

<223> 99-12831. rp complement 

<220> 

<221> primer_bind 
<222> 191011. .191030 
<223> 99-12832 .pu 

<220> 

<221> primer_bind 

<222> 191441. .191460 

<223> 99 -12 832. rp complement 

<220> 

<221> primer_bind 
<222> 195099. .195116 
<223> 99-12836. pu 

<220> 

<221> primer_bind 

<222> 195568. .195587 

<223> 99-12836.rp complement 

<220> 

<22l> prime r_ bind 
<222> 203585. .203602 
<223> 99-12844 .pu 

<220> 

<221> primer_bind 

<222> 204095. .204115 

<223> 99-12844. rp complement 

<220> 

<221> prime r_Jbind 
<222> 210079. .210096 
<223> 4-24. pu 

<220> 

<221> prime r_bind 
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<222> 210476. .210495 
<223> 4-24 -rp complement 

<220> 

<221> primer Jbind 
<222> 210979. .210996 
<223> 4-27. pu 

<220> 

<221> primer_bind 
<222> 211382. .211401 
<223> 4-27.rp complement 

<220> 

<221> primer_bind 
<222> 215852 . .215870 
<223> 5-400. pu 

<220> 

<221> primer_bind 
<222> 216213. .216231 
<223> 99-12852. pu 

<220> 

<221> primer_Jbind 
<222> 216253. .216271 
<223> 5-400.rp complement 

<220> 

<221> primerjbind 

<222> 216708. .216728 

<223> 99-12 852. rp complement 

<220> 

<221> primer_bind 
<222> 221530. .221549 
<223> 4-37. rp 

<220> 

<221> prime rjbind 
<222> 221956. .221973 
<223> 4-37. pu complement 

<220> 

<221> primer_bind 
<222> 225554. .225572 
<223> 5-270. pu 

<220> 

<221> primer_bind 
<222> 2258277.225845 
<223> 5-270. rp complement 

<220> 

<221> primer_bind 
<222> 229341. .229359 
<223> 99-12860. pu 

<220> 

<221> primer_ bind 

<222> 229770. .229790 

<223> 99-12860. rp complement 
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<220> 

<221> primer Jbind 
<222> 2374127.237429 
<223> 5-402. pu 

<220> 

<221> prime r_bind 
<222> 237747. .237766 
<223> 5-402. rp complement 

<220> 

<221> primer_bind 
<222> 1980. .1998 
<223> 5-390-177. mis 

<220> 

<221> primerjbind 

<222> 2000. .2018 

<223> 5-390-177. mis complement 

<220> 

<221> primerjbind 
<222> 4582. .4600 
<223> 5-391-43. mis 

<220> 

<221> primer_bind 

<222> 4602. .4620 

<223> 5-391-43. mis complement 

<220> 

<221> prime r_bind 
<222> 10209. .10227 
<223> 5-392-222. mis 

<220> 

<221> primer_bind 

<222> 10229. .10247 

<223> 5-392-222 .mis complement 

<220> 

<22l> primer_bind 
<222> 10267. .10285 
<223> 5-392-280. mis 

<220> 

<221> prime r_bind 

<222> 10287. .10305 

<223> 5-392-280. mis complement 

<220> 

<221> primerjbind 
<222> 39925.739943 
<223> 4-58-318. mis 

<220> 

<221> primerjbind 

<222> 39945. .39963 

<223> 4-58-318. mis complement 

<220> 

<221> primerjbind 
<222> 39954. .39972 
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<223> 4-58-289. mis 
<220> 

<221> primerjbind 

<222> 39974. .39992 

<223> 4-58-289. mis complement 

<220> 

<221> primerjbind 
<222> 41366. .41384 
<223> 4-54-199. mis 

<220> 

<22l> primer_ bind 
<222> 41385. .41403 
<223> 4-54-180. mis 

<220> 

<22l> primer_bind 

<222> 41386. .41404 

<223> 4 -54 -199. mis complement 

<220> 

<221> primerjbind 

<222> 41405. .41423 

<223> 4-54-180. mis complement 

<220> 

<221> prime r_bind 
<222> 42213. .42231 
<223> 4-51-312. mis 

<220> 

<221> primerjbind 

<222> 42233 . .42251 

<223> 4-51-312. mis complement 

<220> 

<221> primerjbind 
<222> 67456. .67474 
<223> 99-86-266 .mis 

<220> 

<22l> prime r_bind 

<222> 67476. .67494 

<223> 99-86-266. mis complement 

<220> 

<221> prime r_bind 
<222> 69502. .69520 
<223> 4-88-107. mis 

<220> 

<221> primerjbind 

<222> 69522. .69540 

<223> 4-88-107. mis complement 

<220> 

<221> primer_bind 
<222> 72819. .72837 
<223> 5-397-141. mis 



<220> 



WO 01/14550 



PCT/IB00/01098 



<221> prime r_bind 
<222> 72839. .72857 
<223> 5-397-141. mis 

<220> 

<22l> primer_bind 
<222> 76041. .76059 
<223> 5-398-203. mis 

<220> 

<221> primerjbind 
<222> 76061. .76079 
<223> 5-398-203 .mis complement 

<220> 

<221> primerjbind 
<222> 81234. .81252 
<223> 99-12738-248. mis 

<220> 

<22l> primerjbind 

<222> 81254. .81272 

<223> 99-12738-248. mis complement 

<220> 

<221> primer_bind 
<222> 83902. .83920 
<223> 99-109-358. mis 

<220> 

<221> prime r_bind 

<222> 83922. .83940 

<223> 99-109-358 .mis complement 

<220> 

<221> primer_bind 
<222> 91898. .91916 
<223> 99-12749-175. mis 

<220> 

<221> prime r_bind 

<222> 91918. .91936 

<223> 99-12749-175. mis complement 

<220> 

<221> primer_bind 
<222> 95330. .95348 
<223> 4-21-154. mis 

<220> 

<221> primer_bind 

<222> 95350. .95368 

<223> 4-21-154. mis complement 

<220> 

<221> primer__bind 
<222> 95492. .95510 
<223> 4-21-317. mis 

<220> 

<221> primer_bind 

<222> 95512. .95530 

<223> 4-21-317. mis complement 
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complement 
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<220> 

<221> primer_bind 
<222> 96171. .96189 
<223> 4-23-326. mis 



<220> 

<221> primer_bind 

<222> 96191. .96209 

<223> 4-23-326. mis complement 



<220> 

<221> primer_bind 
<222> 97275. .97293 
<223> 99-12753-34. mis 



<220> 

<221> primer_bind 

<222> 97295. .97313 

<223> 99-12753-34 .mis complement 



<220> 

<221> primerjbind 
<222> 98005. .98023 
<223> 5-364-252. mis 



<220> 

<221> primer_bind 

<222> 98025. .98043 

<223> 5-364-252 .mis complement 



<220> 

<221> primer_bind 
<222> 98895. .98913 
<223> 99-12755-280. mis 



<220> 

<221> primerjbind 

<222> 98915. .98933 

<223> 99-12755-280. mis complement 



<220> 

<221> primer_bind 
<222> 98944. .98962 
<223> 99-12755-329. mis 



<220> 

<221> primer_bind 

<222> 98964. .98982 

<223> 99-12755-329. mis complement 



<220> 

<22l> primer_bind 
<222> 103574. .103592 
<223> 4-87-212. mis 



<220> 

<221> primer_bind 

<222> 103594. .103612 

<223> 4 -87 -2 12. mis complement 



<220> 

<221> primerjbind 
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<222> 104379. .104397 
<223> 99-12757-318. mis 

<220> 

<221> primer_bind 

<222> 104399. .104417 

<223> 99-12757-318 .mis complement 

<220> 

<221> primerjbind 
<222> 106354. .106372 
<223> 99-12758-102 .mis 

<220> 

<221> primerjbind 

<222> 106374. .106392 

<223> 99-12758-102 .mis complement 

<220> 

<22l> prime r_bind 
<222> 106388. .106406 
<223> 99-12758-136. mis 

<220> 

<221> primerjbind 

<222> 106408. .106426 

<223> 99-12758-136. mis complement 

<220> 

<221> prime r_ bind 
<222> 108296. .108314 
<223> 4-105-98. mis 

<220> 

<221> primer_bind 
<222> 108308. .108326 
<223> 4-105-86. mis 

<220> 

<221> primer_bind 

<222> 108316. .108334 

<223> 4-105-98. mis complement 

<220> 

<221> prime r_bind 

<222> 108328. .108346 

<223> 4 -105 -86. mis complement 

<220> 

<221> primerjbind 
<222> 108453. .108471 
<223> 4-45-49. mis 

<220> 

<221> primerjbind 

<222> 108473. .108491 

<223> 4-45-49. mis complement 

<220> 

<221> primer Jaind 
<222> 109177. .109195 
<223> 4-44-277. mis 
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<220> 

<221> primer_bind 

<222> 109197. .109215 

<223> 4-44-277 .mis complement 

<220> 

<221> prime rjbind 
<222> 114585. .114603 
<223> 4-86-60. mis 

<220> 

<221> primer_bind 

<222> 114605. .114623 

<223> 4 -86 -60. mis complement 

<220> 

<221> primer_bind 
<222> 115697. .115715 
<223> 4-84-334. mis 

<220> 

<221> primer_bind 

<222> 115717. .115735 

<223> 4-84-334. mis complement 

<220> 

<221> primer_bind 
<222> 122064. .122082 
<223> 99-78-321. mis 

<220> 

<221> primer_bind 

<222> 122084. .122102 

<223> 99-78-321 .mis complement 

<220> 

<221> primer_bind 
<222> 123105. .123123 
<223> 99-12767-36. mis 

<220> 

<221> prime r_bind 

<222> 123125. .123143 

<223> 99-12767-36 .mis complement 

<220> 

<221> primer_bind 
<222> 123212. .123230 
<223> 99-12767-143. mis 

<220> 

<22l> primerjbind 

<222> 123232. .123250 

<223> 99-12767-143 .mis complement 

<220> 

<221> primer_bind 
<222> 123258. .123276 
<223> 99-12767-189. mis 

<220> 

<221> primer_bind 
<222> 123278. .123296 
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<223> 99-12767-189. mis complement 



<220> 

<221> primerjbind 
<222> 123449. .123467 
<223> 99-12767-380. mis 



<220> 

<221> primer_bind 

<222> 123469. .123487 

<223> 99-12767-380 .mis complement 



<220> 

<221> primer_bind 
<222> 126719. .126737 
<223> 4-80-328. mis 

<220> 

<221> primerjbind 

<222> 126739. .126757 

<223> 4-80-328. mis complement 

<220> 

<221> primer_bind 
<222> 128191. .128209 
<223> 4-36-384. mis 



<220> 

<221> prime r_bind 

<222> 128211. .128229 

<223> 4-36-384. mis complement 



<220> 

<221> prime r_bind 
<222> 128311. .128329 
<223> 4-36-264. mis 



<220> 

<221> primer_bind 
<222> 128314. .128332 
<223> 4-36-261. mis 



<220> 

<221> primerjbind 

<222> 128331. .128349 

<223> 4-36-264. mis complement 

<220> 

<221> primer_bind 

<222> 128334. .128352 

<223> 4-36-261. mis complement 

<220> 

<221> primer_bind 
<222> 128575. .128593 
<223> 4-35-333. mis 



<220> 

<221> prime r_bind 

<222> 128595. .128613 

<223> 4-35-333. mis complement 



<220> 
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<22l> primer_bind 
<222> 128668. .128686 
<223> 4-35-240. mis 

<220> 

<221> prime r_bind 

<222> 128688. .128706 

<223> 4-35-240. mis complement 

<220> 

<221> primer_bind 
<222> 128735. .128753 
<223> 4-35-173. mis 

<220> 

<221> prime r_bind 

<222> 128755. .128773 

<223> 4 -35-173. mis complement 

<220> 

<221> primer_bind 
<222> 128775. .128793 
<223> 4-35-133. mis 

<220> 

<221> primerjbind 

<222> 128795. .128813 

<223> 4-35-133. mis complement 

<220> 

<221> primerjbind 
<222> 130786. .130804 
<223> 99-12771-59. mis 

<220> 

<221> primer Jbind 

<222> 130806. .130824 

<223> 99-12771-59 .mis complement 

<220> 

<221> primer_bind 
<222> 133187. .133205 
<223> 99-12774-334. mis 

<220> 

<221> prime r_bind 

<222> 133207. .133225 

<223> 99-12774-334 .mis complement 

<220> 

<221> primer_bind 
<222> 135367. .135385 
<223> 99-12776-358. mis 

<220> 

<221> primer_bind 

<222> 135387. .135405 

<223> 99-12776-358. mis complement 

<220> 

<221> primer_bind 
<222> 139370. .139388 
<223> 99-12781-113. mis 
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<220> 

<221> primerjbind 

<222> 139390. .139408 

<223> 99-12781-113. mis complement 

<220> 

<221> primer_ bind 
<222> 157516. .157534 
<223> 4-104-298. mis 

<220> 

<221> prime r_bind 

<222> 157536. .157554 

<223> 4-104-298 .mis complement 

<220> 

<221> primer_bind 
<222> 157560. .157578 
<223> 4-104-254. mis 

<220> 

<221> primerjbind 
<222> 157564 . .157582 
<223> 4-104-250. mis 

<220> 

<221> primerjbind 

<222> 157580. .157598 

<223> 4-104-254 .mis complement 

<220> 

<221> primer_ bind 

<222> 157584. .157602 

<223> 4-104-250 .mis complement 

<220> 

<221> prime r_bind 
<222> 157600. .157618 
<223> 4-104-214. mis 

<220> 

<221> primer_ bind 

<222> 157620. .157638 

<223> 4-104-214 .mis complement 

<220> 

<221> prime rjbind 
<222> 172961. . 172979 
<223> 99-12818-289. mis 

<220> 

<221> primer_bind 

<222> 172981. .172999 

<223> 99-12818-289. mis complement 

<220> 

<221> primerjbind 
<222> 180603. .180621 
<223> 99-24807-271. mis 

<220> 

<221> primer_bind 
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<222> 180623 . .180641 

<223> 99-24 807-271 .mis complement 

<220> 

<221> prime r_bind 
<222> 180790. .180808 
<223> 99-24807-84. mis 

<220> 

<221> primer_bind 

<222> 180810. .180828 

<223> 99-24807-84 .mis complement 

<220> 

<221> prime r_bind 
<222> 190315. .190333 
<223> 99-12831-157. mis 

<220> 

<221> primer_bind 

<222> 190335. .190353 

<223> 99-12831-157 .mis complement 

<220> 

<221> primer_bind 
<222> 190399. .190417 
<223> 99-12831-241. mis 

<220> 

<221> primerjbind 

<222> 190419. .190437 

<223> 99-12831-241. mis complement 

<220> 

<221> primer^ bind 
<222> 191378. .191396 
<223> 99-12832-387. mis 

<220> 

<221> primer_bind 

<222> 191398. .191416 

<223> 99-12832-387 .mis complement 

<220> 

<221> primer_bind 
<222> 195109. .195127 
<223> 99-12836-30. mis 

<220> 

<221> primerjbind 

<222> 195129. .195147 

<223> 99-12836-30 .mis complement 

<220> 

<221> primer__bind 
<222> 203827. .203845 
<223> 99-12844-262. mis 

<220> 

<221> primerjbind 

<222> 203847. .203865 

<223> 99-12844-262 .mis complement 
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<220> 

<221> prime r — bind 
<222> 210132. .210150 
<223> 4-24-74. mis 

<220> 

<221> prime rjbind 

<222> 210152. .210170 

<223> 4-24-74. mis complement 

<220> 

<221> primer_bind 
<222> 210302. .210320 
<223> 4-24-246. mis 

<220> 

<221> primerjbind 

<222> 210322. .210340 

<223> 4-24-246. mis complement 

<220> 

<221> primer_bind 
<222> 210370. .210388 
<223> 4-24-314. mis 

<220> 

<221> prime r_bind 

<222> 210390. .210408 

<223> 4-24-314. mis complement 

<220> 

<221> primer_bind 
<222> 211149. .211167 
<223> 4-27-190. mis 

<220> 

<221> primer_bind 

<222> 211169. .211187 

<223> 4 -27 -190. mis complement 

<220> 

<221> primerjbind 
<222> 215977. .215995 
<223> 5-400-145. mis 

<220> 

<221> primer_bind 
<222> 215981. .215999 
<223> 5-400-149. mis 

<220> 

<221> primer_bind 

<222> 215997. .216015 

<223> 5-400-145 .mis complement 

<220> 

<221> primerjbind 

<222> 216001. .216019 

<223> 5-400-149. mis complement 

<220> 

<221> primerjbind 
<222> 216007. .216025 
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<223> 5-400-175 .mis 



<220> 

<221> primerjbind 

<222> 216027. .216045 

<223> 5-400-175 .mis complement 



<220> 

<22l> primerjbind 
<222> 216063 . .216081 
<223> 5-40O-231.mis 



<220> 

<221> primer_bind 

<222> 216083 . .216101 

<223> 5-400-231 .mis complement 



<220> 

<221> primerjbind 

<222> 2161997.216217 

<223> 5-400-367. mis 



<220> 

<221> primerjbind 

<222> 216219. .216237 

<223> 5-400-367 .mis complement 



<220> 

<221> primerjbind 
<222> 216303 . .216321 
<223> 99-12852-110. mis 



<220> 

<221> prime rjbind 

<222> 216323. .216341 

<223> 99-12852-110. mis complement 



<220> 

<221> prime r__bind 
<222> 216518. .216536 
<223> 99-12852-325. mis 



<220> 

<221> primer_bind 

<222> 216538. .216556 

<223> 99-12852-325 .mis complement 



<220> 

<221> primer Jaind 
<222> 221630. .221648 
<223> 4-37-326. mis 



<220> 

<221> prime r_bind 

<222> 221650. .221668 

<223> 4-37-326. mis complement 



<220> 

<221> primerjbind 
<222> 221848. .221866 
<223> 4-37-107. mis 



<220> 
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<22l> prime r_bind 

<222> 221868. .221886 

<223> 4-37-107 .mis complement 

<220> 

<22l> primer_bind 

<222> 225626. .225644 

<223> 5-270-92. mis 

<220> 

<221> primerjbind 

<222> 225646. .225664 

<223> 5-270-92. mis complement 

<220> 

<221> prime r_Jbind 
<222> 229368.-229386 
<223> 99-12860-47. mis 

<220> 

<22l> primer_bind 
<222> 229378. .229396 
<223> 99-12860-57. mis 

<220> 

<221> primer_bind 

<222> 229388. .229406 

<223> 99-12860-47 .mis complement 

<220> 

<221> primer_bind 

<222> 229398. .229416 

<223> 99-12860-57 .mis complement 

<220> 

<221> prime r_bind 
<222> 237536. .237554 
<223> 5-402-144 .mis 

<220> 

<221> prime r_bind 

<222> 237556. .237574 

<223> 5-402-144 .mis complement 

<220> 

<221> misc_binding 
<222> 1987. .2011 
<223> 5-390-177. probe 

<220> 

<221> misc_binding 
<222> 4589. .4613 
<223> 5-391-43. probe 

<220> 

<221> misc_binding 
<222> 10216. .10240 
<223> 5-392-222. probe 

<220> 

<221> misc_binding 
<222> 10274. .10298 
<223> 5-392-280. probe 
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<220> 

<221> misc_binding 
<222> 39932. .39956 
<223> 4-58-318. probe 

<220> 

<221> misc_binding 
<222> 39961. .39985 
<223> 4-58-289. probe 

<220> 

<22l> misc_binding 
<222> 41373. .41397 
<223> 4 -54 -199. probe 

<220> 

<221> misc_binding 
<222> 41392. .41416 
<223> 4 -54 -180. probe 

<220> 

<221> miscjbinding 
<222> 42220. .42244 
<223> 4-51-312. probe 

<220> 

<221> miscjbinding 
<222> 67463. .67487 
<223> 99-86-266 .probe 

<220> 

<221> miscjbinding 
<222> 69509. .69533 
<223> 4-88-107 .probe 

<220> 

<221> misc_binding 
<222> 72826. .72850 
<223> 5-397-141. probe 

<220> 

<22l> miscjbinding 
<222> 76048. .76072 
<223> 5-398-203 .probe 

<220> 

<221> miscjbinding 
<222> 81241. .81265 
<223> 99-12738-248. probe 

<220> 

<221> miscjbinding 
<222> 83909. .83933 
<223> 99-109-358. probe 

<220> 

<221> miscjbinding 
<222> 91905. .91929 
<223> 99-12749-175. probe 

<220> 

<221> misc_binding 
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<222> 95337. .95361 
<223> 4-21-154. probe 

<220> 

<221> misc_binding 
<222> 95499. .95523 
<223> 4-21-317. probe 

<220> 

<22l> misc_binding 
<222> 96178. .96202 
<223> 4-23-326. probe 

<220> 

<221> misc_binding 
<222> 97282. .97306 
<223> 99-12753-34. probe 

<220> 

<221> misc_binding 
<222> 98012 . .98036 
<223> 5-364-252 .probe 

<220> 

<221> misc_binding 
<222> 98902. .98926 
<223> 99-12755-280. probe 

<220> 

<221> misc_binding 
<222> 98951. .98975 
<223> 99-12755-329. probe 

<220> 

<221> misc_binding 
<222> 103581. .103605 
<223> 4-87-212. probe 

<220> 

<221> misc_ binding 
<222> 104386. .104410 
<223> 99-12757-318. probe 

<220> 

<221> misc_binding 
<222> 106361. .106385 
<223> 99-12758-102. probe 

<220> 

<221> misc_binding 
<222> 106395. .106419 
<223> 99-12758-136. probe 

<220> 

<221> misc_binding 
<222> 108303 . .108327 
<223> 4-105-98. probe 

<220> 

<22l> misc_binding 
<222> 108315. .108339 
<223> 4-105-86. probe 
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<220> 

<221> misc_binding 
<222> 108460. .108484 
<223> 4-45-49. probe 

<220> 

<221> misc_binding 
<222> 109184 . . 109208 
<223> 4-44-277. probe 

<220> 

<221> misc_binding 

<222> 114592. .114616 

<223> 4-86-60 .probe 

<220> 

<221> misc_binding 
<222> 115704. .115728 
<223> 4-84-334. probe 

<220> 

<221> misc_binding 
<222> 122071. . 122095 
<223> 99-78-321. probe 

<220> 

<221> misc_binding 
<222> 123112. .123136 
<223> 99-12767-36. probe 

<220> 

<221> misc_binding 
<222> 123219. .123243 
<223> 99-12767-143. probe 

<220> 

<221> misc_binding 
<222> 123265. .123289 
<223> 99-12767-189. probe 

<220> 

<221> misc_binding 
<222> 123456. .123480 
<223> 99-12767-380. probe 

<220> 

<221> misc_ binding 
<222> 126726. .126750 
<223> 4-80-328 -probe 

<220> 

<221> misc_binding 
<222> 128198. .128222 
<223> 4-36-384. probe 

<220> 

<221> miscjbinding 
<222> 128318. .128342 
<223> 4-36-264 .probe 

<220> 

<221> miscjainding 
<222> 128321. .128345 
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<223> 4-36-261. probe 
<220> 

<22l> misc_binding 
<222> 128582. .128606 
<223> 4-35-333. probe 

<220> 

<221> misc_binding 
<222> 128675. .128699 
<223> 4-35-240. probe 

<220> 

<221> misc_binding 
<222> 128742 . .128766 
<223> 4-35-173. probe 

<220> 

<221> misc_binding 
<222> 128782. .128806 
<223> 4-35-133. probe 

<220> 

<221> misc_binding 
<222> 130793. .130817 
<223> 99-12771-59. probe 

<220> 

<221> misc_ binding 
<222> 133194. .133218 
<223> 99-12774-334. probe 

<220> 

<221> miscjbinding 
<222> 135374. .135398 
<223> 99-12776-358. probe 

<220> 

<221> misc_binding 
<222> 139377. .139401 
<223> 99-12781-113 .probe 

<220> 

<221> miscjbinding 
<222> 157523. .157547 
<223> 4-104-298. probe 

<220> 

<221> miscjbinding 
<222> 157567. .157591 
<223> 4-104-254 .probe 

<220> 

<22l> miscjbinding 
<222> 157571. .157595 
<223> 4-104-250. probe 

<220> 

<221> misc_binding 
<222> 157607. .157631 
<223> 4-104-214 .probe 

<220> 
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<221> misc_binding 
<222> 172968. .172992 
<223> 99-12818-289. probe 

<220> 

<22l> misc_binding 
<222> 180610. .180634 
<223> 99-24807-271. probe 

<220> 

<221> miscjbinding 
<222> 180797 . . 180821 
<223> 99-24807-84. probe 

<220> 

<221> misc_binding 
<222> 190322. .190346 
<223> 99-12831-157. probe 

<220> 

<221> misc_binding 
<222> 190406. .190430 
<223> 99-12831-241. probe 

<220> 

<221> misc_binding 
<222> 191385. .191409 
<223> 99-12832-387. probe 

<220> 

<221> misc_binding 
<222> 195116. .195140 
<223> 99-12836-30. probe 

<220> 

<221> miscjbinding 
<222> 203834 . .203858 
<223> 99-12844-262. probe 

<220> 

<221> miscjainding 
<222> 210139. .210163 
<223> 4-24-74 .probe 

<220> 

<221> misc_binding 
<222> 210309. .210333 
<223> 4-24-246. probe 

<220> 

<221> misc_binding 
<222> 210377. .210401 
<223> 4-24-314. probe 

<220> 

<221> misc_binding 
<222> 211156. .211180 
<223> 4-27-190. probe 

<220> 

<221> misc_ binding 
<222> 215984. .216008 
<223> 5-400-145. probe 
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tgtagcagca acatcactgc ctgtggaggt 
aatgatgttg aaagggaaga aggaaaacgg 
acattcctga agtatgaagg cactttttgg 
atgatgatgt acctattctc ttatttgaat 
ttgaaattaa tagtagtcac cacagcgcaa 
aaagggaaaa tctttccccc acctgtaagt 
tagccgattc aattatggtg gaaagcttct 
ctggtagtaa cacattttga cttatttcat 
catcattaag atttttgaag catatgttgg 
atttgttgta gctcctttgt aatttctatt 
aaaaaaaaaa tgaatcatgt cttttttttt 
ccaggctgga gtgcagtggc gcgatctggg 
ctactgcctc agcctcccga gtagctggga 
tttttttttg tttttttgta tttttagtag 
gtcttaaact cttaacctca ggtgatacac 
taatccaggc gtgagccacc gctcctggcc 
tagattaatg tatctaagga atcagtttgt 
ttaattactg aagtgtaatt cacattttaa 
aagttcatct tgatacccat ctaattgtac 
gtttaagtgt tttcccagat ctgtttcagt 
atacgggttt ggtttctgtg tggaggtgta 
atatgtatta ttatccgctt tatttactta 
agttgttgta gacctatttt tgttttaaga 
aaaatctatt ggtgggtatt tttttcccag 
tttttttaac catatggttt ggttgttctt 
agacataagt ctttccagct tcccaccccg 
ctacgtgaat ccttgtattt ctgagtactt 
gcagaatttg attttttaaa gacatagagt 
tcctgggctc aagtacttct gcctcaccct 
acgatccctg gcttattgat agatatagtc 
tattgtcacc agttgtttat aagaatgccc 
ttagcctgta gcctttttca gtcggaagct 
tttgattaga agcaggttta agtgcctttt 
ttttaatgtg atcaactatt acctcgctta 
taaggttagt ttcggctgca tataacaaag 
ttgcctttct ctgtcttgcc tagttcagaa 
tggtgtcttt aagaaactag gttcccatct 
aaatgtgtgt gcctcccagc taagccatct 
aatattcctg tctaattcca ttggctggaa 
aagactgaaa tgtagtcttg actgggatgc 
tgttaagaag aagtgagaat ggacattgag 
tccaaattgt ttcagtgtgc aattatgtgt 
tctattactt ggcagtgtta attctgctac 
actgaaaatt ttatgtgtgc ttcccttcct 
agttttctca gaaaagtatc cttttgagtc 
aacatttctt ttgtttcttg actattgtat 
ttatttgtta agtcattgac atcctaagtg 
ccagcacagc tgacatttgg gtctgggtaa 
gtaggaagct cagcagcatc cctgcctctc 
ctccctcact ggcgatatcc aaaaatgtgt 
caacgtcacc tctggttggg aagcagtgct 
cttgaagaaa aatccatgat tatcaaataa 
taattctgtt gagcttctga ttagattcag 
tgtgattata ggatttaata gaatcaccta 
gtctgctaag aaatactgaa actttatcta 
ccaaatgatt cagcagtctc atgataatcc 
catttcacgt gatactttgt gttcaggtaa 
aagttttgag aataatatga ttttctgatt 
aaaataatta gttaaaacta gaacttctaa 
aattaaggca tgagttaaac ttcctttttg 
tgctttcttg ctacagaatc cattgctcta 
taatctgtat gcagtttaaa ctacatagaa 
acacaatgta taattaacac aaggaacctg 
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ggtggaggta gaatattagc aggagtaggt 34140 

ggtgtggggg gttgttcttt aaaaggaatc 34200 

tcttaaagtg gattttttgt ttattttcag 34260 

ctaatggttc attaatatat actcccacaa 34320 

tggagaagag attacaagag atgaaggaga 34380 

aattagtttg taaaatgaaa attatgcaaa 34440 

tttttctttg cctagatatt ttaatgtttc 34500 

ggctggcttt gttttccaga aaatcttatg 34560 

gtgtatagta ttcttcaagt ttaaaatcct 34620 

atctttggaa ttttttcttt cttttttttt 34680 

ttttctgaga tggagttttg catttgtcac 34740 

ctcactgcaa cctccctagt tcaagtgatt 34800 

ttacaggcgc ctgtcaccac tcctggctaa 34860 

agacggggtt tcaccatgtt ggtcaggctg 34920 

ccgcctcggc ctcccaaacg gctgggactg 34980 

gtgaatcatg tcttttgaag gaatttgctt 35040 

ttttcattat ttcttttatc tttaaaattt 35100 

taaaacattt atcaaagtag ctaatagtaa 35160 

tcttctacct gggggtaacc tgtattttaa 35220 

gtatcagata tctgtgtata catgaaaaag 35280 

atttctgttt tacctaaatt agataatgac 35340 

agagtatcct ggagggtttg tttgcagctt 35400 

tgctcaaagt agtctacagt tttgatattg 35460 

ttattagaaa ttgtgttgca gtttttattc 35520 

gtttttttgt taagccattt tcctttctct 35580 

actttttact gttataaccc ctgcatgtgc 35640 

cgtgtatttc aataatacta attcatacat 35700 

ctccctgtgt tgcgcaggca ggacatgcac 35760 

ctcaagtagc taggaataca ggtgtgtgcc 35820 

aaattatcct tcaaaaaatt tgagtcatct 35880 

ctttctccat acttggaaaa ctgaatggca 35940 

tgaaaaactg gatctgttct tgaagttact 36000 

catattactg actgacttac cgaatgcagc 36060 

attttatgtc ctttgtccat ctgtatcagt 36120 

acaaaaacca atgtgttaca atcgatagaa 36180 

gtaggcagcc agggctggga tgccattcca 36240 

ttctgttgta cctgcctggc ttttcttgca 36300 

ccttttgaca gccttaccag acgtctatcc 36360 

tgtggtcata tggccacccc ttttgcaagc 36420 

attgctgtcc tgataaaatc aaagttctgt 36480 

gtagataact agctgtgtcc caggtggaca 36540 

ataaactaat ttgccttaaa ctttactttt 36600 

tttactgcgt ccagtacagt ttaaaactta 36660 

tatcttggtt tattctcttt tttttgctga 36720 

tctaaaaaat atctttggat ataagatcca 36780 

gaaccgcctt tgaagataat acttacgatc 36840 

ttttctatga aacctctagg atttctcaac 36900 

ttctttgttg ggggcactgc cctgtgtgtg 36960 

cccactaaca ctagcagtgt acctactgct 37020 

ccagacatta ccaaatatct gctgggaccc 37080 

ctagttttag aggtaactat gatgagcatc 37140 

gaagactaga acagactgga aatgttcact 37200 

gcaagttgac tttaagatcc cttctaactt 37260 

tgattaatag gaggacttcc tgctggcttc 37320 

atgcagtgtc ttggtcctgt ttttagcttc 37380 

aagtaactct ctgtgtgaag cacctttgaa 37440 

aatttttatt ttcctttctg tgatatgttt 37500 

tagaatttca tgtagcaact tctgatgagt 37560 

atttccccct gaaattaggt attataataa 37620 

gttcctatag gttttttttt cctaggcatt 37680 

tttaaaaaat tattgtgaac gtatatgaac 37740 

ctgaggtcag agctaaggaa atgttgtttc 37800 

ttattgaacg gggtcagtga agtatgtaaa 37860 
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<220> 

<221> misc_binding 
<222> 215988. .216012 
<223> 5-400-149. probe 

<220> 

<221> misc_binding 
<222> 216014. .216038 
<223> 5-400-175. probe 

<220> 

<221> misc_binding 
<222> 216070. .216094 
<223> 5-400-231. probe 

<220> 

<221> misc_binding 
<222> 216206. .216230 
<223> 5-400-367. probe 

<220> 

<221> misc_binding 
<222> 216310. .216334 
<223> 99-12852-110 .probe 

<220> 

<221> misc_binding 
<222> 216525. .216549 
<223> 99-12852-325 .probe 

<220> 

<221> misc__ binding 
<222> 221637. .221661 
<223> 4-37-326. probe 

<220> 

<221> misc_binding 
<222> 221855. .221879 
<223> 4-37-107. probe 

<220> 

<221> misc_binding 
<222> 225633. .225657 
<223> 5-270-92 .probe 

<220> 

<221> mi sc — binding 
<222> 229375. .229399 
<223> 99-12860-47. probe 

<220> 

<221> misc_binding 
<222> 229385. .229409 
<223> 99-12860-57. probe 

<220> 

<221> misc_binding 

<222> 237543 . .237567 

<223> 5-402-144. probe 



<400> 1 

tctccccaaa ttcatctgta gagtcaacac aatctcaatc aaaatcccag cagtattttt 60 
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ttgtgcaaaa tgagaagtcg actctaagat ttaaaatgaa atctgaagaa tctagaagat 120 

acaaaataac cttgaaaaat aaagttgtag gacataaact atctgatttc atcacttatt 180 

tataagctac aataatcaaa acagcatggt gctggcagca aaaagacaaa tagctcaatg 240 

gaacacaata ggaagcctaa aatgaaacac atacatatgc aacacagatt ttgatgtaag 300 

cacaaaggaa atgcagtaga gacaaaaata actttttaat aaatgatgct ggaacatttg 360 

gatatgtata catgcaaaaa aatgaacttt ggtccctatc ccataccgta tacaaaaatt 420 

aattaaaagc agatcttatc ctttgagtcc agtaggttga ggctgcagtg agctgtgatt 480 

acaccactgc attccagcct gggcaacgga gtgagaacct gcctggagaa aaaaaaaaaa 540 

aagtagaacc tagacctgat atacaaccta aagcagtaat atttctagaa gaaatcctag 600 

gagaaaatat ttgtgatcgt ggagatgaag aatctatcaa atactaaact ttttttacca 660 

ccttgaccaa aagtaattgg tttatatact tcatcatatc atttaattct aaatctacag 720 

agatcaatgt cactttctca gtaaaagtac gtgagtcttc aatgatgccc tgaactcaca 780 

ctcccaagta aaccataaca ccatatttcc agagtagagt ttattagaac aataactggt 840 

gataatgata aatattgatc aaagactgag cctaggaagt gggttttttg aggctgcata 900 

tactcaaggc aattcttcag aaccacagag ggctcattgg atcctattaa aagctgagag 960 

ttaatgaata aacagataaa acagagacct gagtagacgg tagtcgatat tcttgtacat 1020 

gtattctacc tctagattcc atagaaagaa ctaaaagtac atgaatttca ctaccaacat 1080 

ctccatcagt taccagctgt atcaccttgg atcagtcagg taacctcccg cgaatttgct 1140 

tccggggcag gggatcgcgc tgcaggtttg agcctgggag ccggcagggt ggagcagttg 1200 

gagggccaag cctttgagct ccaggggggg tggccgggac agtgggtagt gccagccgat 1260 

cggcgtcctg gggattgcct gaatgtgagg tctgggttca ccccgcggtg acctgagtcc 1320 

tgggatgccc ctacagtgat ttgctgcctc agggatccga agtctctttc attcccttac 1380 

tggggatttg aggtctggag gtactcctgc gggggtctga gatctcgggg tcaccctgtg 144 0 

ggggtctgaa gcctcgggtc cccgctgggg tctgaggtat cagagtcccc tccgttgggt 1500 

ctgaggtctc ggggtccccc atccccggga tcggaggtcc ggctccccgg agcaggcagg 1560 

gcggtgcgtc tggccctgac agtaacgtgg cgcgccagcc ccaggtggtg tcgggctagg 1620 

ggggcataac ggtgccgaaa gtccgcacaa agccgtccgc tggggtcccg ccgcgcccgc 1680 

gaggcaatga ctgtgccccc tccccttcct gatcctcagc tcaggtgagc ccagatgagg 174 0 

cgccgggtag cttctaagtc actaatggaa atagaaggct aattcagggg ttaggggccg 1800 

tcgtccttct tactcgcagg agaagagaaa aacccacggc ccagcagcca gaggcgcggc 1860 

gaggcggaat cgggccccct ccccgggggc tcagctccct ccagcctccc gcctcaccta 1920 

cagagaaatc ccggaaacgc ggattcagcg gagcgcggtg acggcggcgc gctcaccccg 1980 

cgcatgccca gtgcccgcsc gcgccgccag gctcgcaagc accgcgtagg ccagctggcc 204 0 

ggatcccgcc gtctgtcatg gcggccccca tcctgaaagg tgaggtactt cctgctgcct 2100 

gctccagcag cgggagtttg aggaccggca cccctcgtcg cgggcgcact cgggggatcc 2160 

cgtgggagga gccccgctcg cccctccctc gctgcctgtc tcccccagac cccctgccgc 2220 

ctccttcctc ccccgctgcc tgtcccccca aaacccccgg ctgcctgctt cgtctcccgt 2280 

gctccctgtc cccccaaacc cccgactgcc tgcttcctcc cccgtactgc ttgtgcccca 2340 

acccccgtgc tgctagttcc cctcaatccc ccgctgcctg ctccctcccc catgctgcct 2400 

gtcccccaaa tcccgccttt ccccctacct gctttcaccc ctgctgcctt agtccctgga 2460 

tctggggctc actggcaggc agagtcctgc cctccggaag ttggtgtggg gccctcctgg 2520 

gtctggtcct gttcgacccc ctctgaggcc cacctggagg agcggcagtt gagtttctat 2580 

gctaattgtt ccaataatag gagccgcctt ttactgcgga gtctttgtgt gccaggcgct 2640 

gtgcttaggc tagtatggta ttgtctgatt tttttaaccg ctctatcaac tctcttatat 2700 

cattgtacag gcagaaacta aggcattgga cgtttaggtg actctccctg tgtgtggcta 2760 

gtcagtgctg acagggcctt agaccggagc tgctgtccta accagtatat gataccgcac 2820 

gcagtcccac cctctgtgca cctggaagag cccaggagag gggaatagcg gacacgtgtc 2880 

ttgtagagtt tgaccgtgag aaaaaagggg cctgtattgt ggggcctgca gtcataaaac 2940 

ctcatagcca aaagtaaaga ctagaggctt tatacaaagt ctgtaatcag atgtggctat 3000 

ttttctaatg ttagtatttt gttaaattaa cctggttttc ttttagcgtt acccccaatc 3060 

attgaccaac ggcacacctg gaaaatgctt ttaaacatca ggttttgaga agaggatatc 3120 

cactagaaca ggggtccact cactatgccc cccaggccat atctagcctg ctgcctgttt 3180 

ttgtaagggt ctacgagcta agaatgtctt ttacattttt aagtgatttt aaaaaaaggt 3240 

caaatgaaaa attatatcac attcacattt ccttctccat aaataaagtt ttattggaac 3300 

acaggccggc ccgttaatat attacctatg gttatgtttg tgccacaacc gtgaagttga 3360 

gtagttgtgg caaatactgt attggccaca aagcctgaaa tatttaccat ctgtctcttt 3420 

acagaaaata ggtttctgca ctggaaaaat taagcgtaag aatttgggga aagcaactaa 3480 

ttttacaaat gtaaactctc atgtattgta tgggtacagt tgttctttgc ttaaaatttt 3540 

aataaattcc actgaagcta ttttgaaaag gctttcagta gaaatttatt tatgagacag 3600 

agtcttactc tcttgcccag gctggagcgc agtgatgtga tcacataata gctcaagcaa 3660 

ttctgcttca gcctcctgag taacttggga ctacaggcac taccatgccc ggttattttt 3720 

atttttattt tttagtttat tatttttttg tagagccagg gtctcactat gttgcctagg 3780 

ctggtcttga attcctagcc tcaagcaatc ctcccgcctc caccttgcaa aatgctggga 3840 
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ttacaggcat gagctacttt gttcagccag tagaagaaac ttcatttact tttcttattt 3900 

ttgaggcaag gtctttctct gctgcccagg ctggagtgca atggtgcgat cataactcag 3960 

cttctacctc ctgggctcta gggattctcc cacctcagct tctccaccct acccaccccc 4020 

atttcccacc cagtagctgg gactacagcc actcgccacc attcctggct aattaaaaac 4080 

aaaatttttt ttagagacag ggtttcacta tgttgcccag gctggtctca aactcctgtg 4140 

cccaagtgat cccactgcct tggccttcca gagtgctgca attacagcat gagccaccac 4200 

acctggccag tagagtaaat ttttgtttta cttttttctt ttttttattt ttgaaacggg 4260 

tctcgccctg tcacccaggc tggagtgcaa tggcgcaatc tcggctcact gcaacctctg 4320 

cctcccgggt tcaagtgatt ctcctgcctc agcctcccag tagctgggat tacaggtgcc 4380 

cgccaccatg ctcggctaat tttttgtatc ttttagtaga gatggttttt caccatgttg 4440 

gcccggctgg tctcaaaccc ctgacttcgt ggatccaccc acttccgcct cccacagtgc 4500 

tgggattaca ggcgtgagcc actgtgccgg cctcggttta ctcttaaatg taaatagaac 4560 

aaaatctatt gggcagggga tgctggaatt tcaaatgtat rtttcatgtt catatcttgt 4620 

tttcagatgt agtggcctat gttgaagtgt ggtcatccaa tggaacagaa aattattcaa 4680 

agacatttac aacacagctt gtggatatgg gggcaaaggt aagacactta ttttgctgtt 4740 

gattcatatg acagtcttct gattggtaaa aagttacatt tgcattttct tattttggga 4800 

gtttttactt agaatctgga cgaagcaatg ggtaagcggt gggagaaaaa agagccaaag 4860 

tgtgaagaat ttagaacagt aggactttca gaactcaatg cctgtgggca ttgagtgagg 4920 

aggaggaacc taggatgaaa tgctggattc ttacactggt tacttgaatg catagtgcta 4980 

ttaagcaaag tgaggaatac aggaaaagga acaggtttct aagggaaaaa ttgtaaattt 5040 

gggcatactg aaaatatctg ttagatattt ggatatacaa gtctggagct tggagtgttc 5100 

aaggctagag atgatgatct agggggtcag gaccataggg gtcatgtgaa gtcacaggtg 5160 

tggacatcgt cccatgtcag gcatggttag gatgaagagt ggtgacagag gagcgttgtt 5220 

cagtattcaa ggacaggcga tgggagcagg gacccagtga cagagggaga gaagaatgcc 5280 

aggagaagga gaaaggaagt gtggaagtca aagtagggag taattttttt tttttgagac 5340 

ggagtttcgc tctgtcgcta ggctggagtg cagtgacgcg atctcagctc actgcaatct 5400 

ctgccttctg ggttcaagcg attgtcctgc ctcagccttc caagtatctg ggactacagg 5460 

cacatgccac catgcctagc taattttttt ttttgtattt ttagtaaaga cggggtttca 5520 

ccatgttggc caggatggtc tcaatctcct gatctcgtga tccgcccacc tcggcctccc 5580 

aaagtgctgg gattacaggc atgagccacc gagcccggcc aggagtaatt ttttaattgc 5640 

ctttcagaac tagaatggag taattttaaa gatagaattt ttaaaaacta cagaaagttc 5700 

aagaaaaata ggatgggcaa atgtactttg gatttgaaca ctgtaaggtc attgctgaac 5760 

ttagtgcagt tttcagtgaa atgggcagga atcattgagc tatgaggaaa tggagatagc 5820 

aaacaatttg ccttattcaa ggtttcttag tatagccatc tctgttatca gatttactat 5880 

cacgtactgc ttgtgttcag gtagcctcta tttgacttaa taatgtcctt gataccaaat 5940 

aggtatcttt tgcccacgca cactaaaccg atcactttga tgacgggttt tacaaaaggg 6000 

aaaagattca ttcacaggga agcccagcta ggaggcagaa gagtactcac atcttcattc 6060 

ccaaagataa ggcttaggga tatttatcag ttagggaagt agggtgatct aagctgtggg 6120 

gaaaaatgaa gtacatgatc tgcacaagca tagttgggat tcatggaatg catgtttaga 6180 

aaacaggcat tattaggagg ccaaggcagg cggatcacct gaggtcagga gttcgagacc 6240 

agcctggcca acatagtgaa accccatctc tactaaaaat acaaaaaaaa gccaggtgtg 6300 

gtggcacaca cctgtagtct cagtgattcg ggaggctgag gcaggagaat cgtttgaacc 6360 

tgggaggcgg aggttgcatt gagccgagat tgcaccactg cactccagcc tgggcgacgg 6420 

agtaagattc tgtctccaaa aaccaaaaaa ataggcacta gtaggatccg atggtgaaga 6480 

ttttggcctg atgtcaaaag gtcatttctt gggcatttac acaggcctgg ttgaagagtt 6540 

ggtggttgca gcctgtttga actgtacggg tgctgcccca agttcctgaa aagtaactta 6600 

agcaactgtt accgtggtga catatccacc agaagttttt atcttataag gaagccagtg 6660 

aaggttatag catttagtag tatgacttgc agctatatag aaataaataa ataaataaca 6720 

aaaagcaagt gaccaaaagc aagcaaggca ggttaaattt ggcagaacta attttcagcc 6780 

gtaaagtgca agagtgatga tgctggcaat tcagatatgc cagagaagcc ttaaggtgct 6840 

ttaagtgaaa aggtgaaagt tctccacttt aaggaaagga agaaaattgt gtgttgaagt 6900 

tgctaagatc tacagtgaga acaaatcttc taatcttgaa attgtgaaga actatgctac 6960 

tgttgcagtc acaccaaact gcaacagtta cagccacagt gcgtgatttt tattataata 7020 

cattgctaca attaccctat tttgttatca ttattgttaa tctgtgccta atttgtaaat 7080 

aaaacttcat tgtatatgta tgtataggaa aaaacagtat ataacctgtt cagtactagc 7140 

tcaggattca ggcatccact gggaggggtt gggggcggga cgcgggcatg tcttagaact 7200 

taacccccgt ggataagggg gaactaatgt gctcttatag ggagtttagt tatgaacaaa 7260 

tcctgtttat gtccttgtct ggcatttggg aggggctgac tgataggctg agtgaaagag 7320 

aaccattaaa aatgggagaa aagataatcg aaggcagggc taggttaggg tggagcaaga 7380 

gagctgctgt gggtataaaa cttaagaggc gctcaccacc aggcaaagag tgggtgctac 7440 

tgaataccct aagagccttg tttgacctcc ctaatgcctg tcttgagtaa gaggtcagtg 7500 

gagaggaatc cgaatatagg agcagggcct gcactgcagg aggggagaca tgccccctgt 7560 

aatacactgg aatgtaggaa cccgagggag tctgcatgtt gcacatgcct aacatttact 7620 
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tgggatgagg aggaactact gtgaataaga 
gtggtttgag ggtagcatta agatcttaga 
tttcaaaaat tggcaaacag ttcaaagaat 
gtaagatttg gatatttgta gtactcgtaa 
ttgagatgat ttaaaaccag actacaggcc 
cttaatctag atgcagctct ttgggtctca 
cttgtctctt tgaggcttct caattttcgt 
gccttgtggg aagatggtct caaatgctag 
agaatctcag cccggtaatt cttaattgcc 
tggaaattga gagagagaga gagttataaa 
aacgggcagt gccatgaaat tcacaatgaa 
acatttcctt ttccattttt caaagacagc 
ttaccttcat atttccaagc attgtacata 
aatcatgcag atgagttctg cttctggaaa 
atgactgtct gagcatgcct ttattatact 
taaaatgcta gcacaaatat tatttttcct 
cagactgctg ttgataagac tgattcagtt 
ctctatcctc ttcaccttga ttacccactc 
ctggaattct gtgataagct tggtgtagtg 
ggggattctt ttgatctaga atgcatatcc 
tctttaataa tattttctcc attttgtgta 
ctgttggatc tcctggtctg agctcctgat 
ttctggttct tctcttctac taccagtgag 
gtagaaaatt tttttactta tatcatcttt 
attaaaaaat aatcttctga tcatgttgca 
tggagatgga gtcttgctct gttacccagg 
tgtaaccgcc acctcccggg ttgaagtgat 
actacaggca cgaacctcca cgcttggcta 
ttccatgttg gccaggctag tcttgaattt 
ccaaagttct gggattacag gtgtgaacta 
ctgaattttt ccaacttttt gaatgattgc 
atgcatttaa gattacaggc gtccgccacc 
agagacaggg tttcacgagg ttggctaggc 
acccgcctcg gcctcccaga gtgctgggat 
gatacagtat cttaagatat gaggtatttt 
tctgttgcct ctgaatttca tttggtttta 
gtccattatt atctgactct ttccatcttt 
ttctcttctg tcctctttat ccttgcaggg 
gagggttttt agagggagta gatgtgaact 
cagatatgtt ttaagcagcg ttatacattc 
aacagaattg ctggggtaga ggttttttga 
aatgtacatt ttgttttctg cattttgtct 
aactcacgtt atcttcaaag atggctacca 
cgtaaagctc gtttcggtgc tctgggtkga 
ccttaagtat ctagtattga aaatgkgtgg 
aaagtttgat tttcatcttt tctctgcctc 
cctcaatttc tgtggttcaa aaatggtcat 
aaggagcctg caggtcctga agctgggtat 
gtgatgaagc agattgtgtt agtcagcaag 
tcgcttgagc aacagaaaag gactttctca 
gttgtccaca gggttggtat cttctttttt 
tccaagctag agtgcagtag ctggatcttg 
agtgattctt atgcctcagc ctcccaagta 
ggctcaagtg attctcctgc ttctgcctcc 
catgcctggc taatttttgc atttttggta 
ggtctcaaac ttctgacctc aggtgatcca 
acaggtgtga gccaccgtgc gcggcccaca 
ttttgaaatt gggaagtacg agtcctgtaa 
attttgattt gagttccttg ctaagattgt 
ctatatgaat tttatgatca gtgtgtcaat 
tttggtagga ttgtattgaa tctctaatta 
tccatgaaaa tgggatgtgt ttctttttca 
agttttcagt gtacaagttt tttaccccct 
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aaaaagccgt tagacaagtg agttgacaag 7680 

tcttttagaa ctttttggtt tcacctttta 7740 

agtgaataca gatcaatagt cgttaacatc 7800 

tccgggatta tcttaagcca atttcaggat 7860 

tgtgagggtc attataactt ctgattcacc 7920 

gcgcaaggtg taggggtttt accaaacccc 7980 

cctgtttagt gtgtactaaa tttgataaaa 8040 

actcatctct ctaggtgtca gtcttaatct 8100 

ttgatagctc ccatggactt gatgggggtg 8160 

agtaatacat atttattgtt taaaaagact 8220 

aagaaagaga aaccagcaac gctttgcagt 8280 

tactttcaaa tcatctgttt cttttggtat 8340 

tattacttca gtataattga atgctataaa 8400 

ggatacataa aaggtaaaat ttttgacacc 8460 

gttacctttg actaatattt tagctgagtg 8520 

tgaaggtatt tcccattgtt ttctcaattc 8580 

gtcacttatc attgtttgca tgtgatgtgt 8640 

ttttaatatt tttccctttc gccaaccatg 8700 

ctgttttcgt tcttgtgccg ggcctttgtg 8760 

tttagtttga gaaacttttc tttgattatt 8820 

ttcctgtatt ctttaacttc tgttgattgg 8880 

gttcttgcct tttgtctcct gttgtccgtc 8940 

cttttctcaa cttcattgtc tgatatttct 9000 

tcttaatttc caagagctct ttaggatcct 9060 

tgaatacagt atcttttgtt tttttttttt 9120 

ctggagtgca atggcacaat cttggctcac 9180 

tctcctgcct cagcctcccg agttgctgag 9240 

atttttgtat ttttagtaga gacagggttt 9300 

ctgacctcat gatccacctg cctcggcctc 9360 

ccacacccag tttcctttgg ttttaattag 9420 

acttattttc aaccttctta ctttgtattt 9480 

ttgcacccgg ataatttttg tatttttagt 9540 

tggtctcaaa ctgctgacct caggtgatcc 9600 

tacaggcgtg agccaccatg cccagccatg 9660 

taattttggt taaatatgtg ttctgttttc 9720 

tttttttgat gtagaaagct tttctgaaat 9780 

aaaaatgtgg tgccttctca tggccacatt 9840 

ctccaactct attctttcag taacacttca 9900 

tgtgtgtatg attcaccgtt gtaactggaa 9960 

ctttgagtgt ttctctgtca gattttgaga 10020 

tcagttgtag ttaagttgtg aatgaacagt 10080 

acaggtttca aaaactttta acaaacaagt 10140 

gagcacttgg gacaaagctc agaagagagg 10200 

aaagtaagca gtttctctct tacttttttt 10260 

agatattttt cacaggtcgg agaaccagat 10320 

ttacctcacc aagtaattta catcctccag 10380 

gctataatac ctaactctgc ctagggggaa 10440 

gcaaggtgga cttaggaagc aagagggaat 10500 

cgctgctgta acaaaggacc acagaatggg 10560 

caactctgga ggcaggaagt ccagtatcaa 10620 

ttttttttga gacaaagtct tgctctgtca 10680 

ggtcactgca gcctcagcct cctaggctca 10740 

gctgggattc atctcaacct ttgcctcctg 10800 

cgagtagctg ggattacagg cacgcaccac 10860 

gagacggggt ttcatcatgt tggccaggct 10920 

cctgcctcgg cctcccaaag tgctaggatt 10980 

cagttttgat tacagtaaat ttgtagtaag 11040 

cttgtttttc attttcaaga ttgtttggct 11100 

ttggctgtct tgattgggtt ccttgcattt 11160 

ttattcaaaa aacaaaaaag gcagctggga 11220 

gggaagtgtt cataatattt aatcttccag 11280 

ggtctcaaat ttccttcagt gatactttct 11340 

aggttaaatt tattcctaac ttttttgttc 11400 
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attttcatgt 
gaaatgcagg 
ttagttctaa 
tgtaaataga 
tgtcatgtca 
gagaatggac 
tttcgtatga 
attcccttct 
tgtcagatgt 
gtggtgtaat 
cttggtcatg 
ggtggctcac 
ttaggagttt 
attagccagt 
gaatcacttg 
agcctgggtg 
gtcaagattt 
aaaatttatt 
acatatgatt 
cattttttct 
gttttctgta 
tcaacagctc 
tgtgtctgtg 
ttctgtgtaa 
ttattgttca 
ttctccatca 
catttctagt 
ctttggtttt 
taaatagttg 
gtttttcttt 
tgtattatgc 
gcatgaggtt 
attatgtggg 
cctcaaacac 
gggtttctcc 
gctgcccagt 
agtgcttgct 
gaaaaagtgg 
ttcttttatt 
ttcctgtatt 
ggcataaaca 
cttcttccct 
atcatttttt 
tctgtgctca 
agtagctggg 
cacggggttt 
cctcagcctc 
ttttttaaga 
tggccttttc 
aaatattctg 
ctgcttagct 
atacacacac 
tttaccccaa 
ctgttgccga 
ggctcaatca 
ccatgcccag 
tgtggcctag 
ccattctctt 
gctaattttt 
gatggtcttg 
gcattaaaga 
gtgatccact 
agctgataat 



gaatgaaatt 
tgattgttgt 
atatatttgt 
gatagattta 
tgttatttgt 
gtccttgtgt 
tggtaactgt 
gttgcaggtg 
ttttctgcat 

tgaggggttt 

ttgtatacgt 
acctgtaatt 
gagaccagcc 
catgttggcg 
aacctgggag 
acaagagcgg 
tctcattatc 
ttgtttagat 
cgtttttggc 
gattagctgt 
aagggacaga 
agctctcttg 
tggcagcaaa 
tacacctttt 
gtcttttttt 
aatccattct 
tggtttttgt 
cctgtcttcg 
tcttaatgac 
gatagtgtgt 
acgttgtaaa 
tttgttttag 
cagttgctta 
atgttgttca 
tatggttctc 
ctccttcccc 
gcgccacacc 
tgatgcccag 
tccctccagc 
tccaggacgt 
gaaatggaac 
tcaaaattat 
ttttgagacg 
ctgcaacctc 
atcacaggca 
caccatgttg 
ccaaagtact 
ctcatggctt 
aacagattca 
cttcgttgtt 
atcttttatt 
acacacacac 
atacttattt 
ggctggaatg 
gtcctctcac 
ccattaaaaa 
tgcagtggcg 
gcctcagcct 
tttttttttg 
atctcctgac 
atttttttta 
cactccagcc 
atttgattca 



gttttcttaa 
atgttgatct 
gggttcctta 
cttcttcatt 
tctggccaga 
tgttgctcat 
ggtttgtgta 
gtttgtttgt 
gactgatcat 
tgtgtgttga 
tgtcatcttg 
ccagcacttt 
tgatcaatat 
tgtgcctata 
acggaggttg 
ggaaaaaaaa 
tttcactttc 
tttacagaac 
cattatttct 
atatcttttt 
tagtcaatat 
tggtgtgaaa 
cttataagtc 
aaacgtttgc 
ttctcctcag 
gctacttagt 
atacattctt 
gttatgagtg 
cttgtctgat 
cacatttttc 
tactatgtag 
caagcagtta 
gatgcgccct 
ggggtcagcc 
ttacttcctg 
tggtctcttc 
agctccctca 
attcttttgc 
tttcaaagct 
atagcttata 
ttagtatgtt 
ttgatctttt 
gagtctcgct 
ctactccctg 
cgtgccacca 
gccaggatga 
gggattgtag 
tactgtaata 
ggtctatatt 
ttgcttttct 
tcagtcactt 
acagacacac 
gacagtattt 
caatgactca 
cccagccctc 
gttttttttt 
caatctcggc 
cccgagtagc 
tatttttgta 
cttgtgatcc 
tagacatggg 
tctcaaagtg 
agttcaaggg 
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tttttttaag 
tataccctgc 
gcattttcta 
tctagtctgg 
acctccagca 
ctttggagaa 
aatgtccttt 
ttctgattat 
catgtgattt 
agcaaccttg 
cctttttaat 
gggaggccga 
ggtaaaaccc 
gtcccagcta 
cagtgaacca 
aaaagagtaa 
agaagtttaa 
ttgttgaatg 
tcatctatct 
ctaattagct 
tttaggcttt 
ggcgtaatag 
tggcaggaag 
atttgtccat 
attgatcttt 
acctttattt 
tttatttcgg 
tattttctat 
aattttgagg 
tggctcttca 
actctggatt 
acttgtcgtt 
ttaagcctca 
agagacttga 
agtccttacc 
agcgagaaag 
gagactgtgg 
ttccttttaa 
ctcacatagt 
gttcacctat 
atttttgaag 
tggagactcc 
ctgtcgccag 
gttcagcgat 
cgcccagcta 
tcccgatctt 
gcgtgagcca 
tgttttgaat 
tctgcaaaag 
tctgggactc 
tgacttcaac 
acacacacac 
gtttttgttt 
gttgcagctt 
cctagtggct 
ttcttttttc 
tcactgtaag 
tgggactaca 
gtagagatgg 
acctgccttg 
atcttactat 
ctgggattac 
ttttgttata 



ttgttagctg 
aaatttgctg 
tatgcaatgt 
ccgcatgtta 
cagtgttgaa 
aagctttcag 
tttaggctga 
taaaggaagt 
ttgtccttca 
cagtcctagg 
tgttggaaca 

ggggggtgga 

tatctattaa 
ctcgggagat 
agaccacgcc 
ggggtccttc 
gtgtctcgat 
tgtagttgcc 
tttctgcccc 
gtataccagg 
gcgggccata 
aaaataagta 
ccaggtagtt 
agtgctctcc 
tttcaagttc 
gagatatttt 
tttcatgtga 
tacctcgatg 
ttggtatctg 
tatggcaaat 
cttttctatt 
aaaatgaaac 
ggtgcaggct 
acttctatac 
tcatttctct 
gaggccggag 
ctgcctttag 
aatttgcctg 
tggtttattt 
atttgtatat 
catctgatgc 
ttagggatat 
gctggagtgc 
tctcctgcct 
atttttgtat 
ctgacctcgt 
cagtgcccgg 
tgatcattcc 
tttctgggaa 
caattatgtt 
ccttttatat 
acacacacac 
tttgaagaca 
actgcagcct 
gggactgtag 
tttgagatgg 
ctctgcctcc 
ggcgcccacc 
gattttaccg 
gcctcccaaa 
gtaggccagg 
tggtgtgagc 
ttcttcagtt 



ttagtgtata 
aacttgttta 
tgtgtaattt 
tgtcatgtca 
tagaagtggt 
tatttcatta 
ggacgttccc 
tagatattgt 
ttatattaat 
ataaatccta 
ggccggatgc 
tcacctgaga 
aaatacaaaa 
tgagacagga 
attgcacttc 
tctggctttt 
atggtttttt 
tgtgttttct 
gtttcctcct 
gattggtaaa 
tggtctctgc 
aacaaatgct 
tcccaatcct 
ttcttcttca 
attgactttt 
tgatttctaa 
gatttctcac 
agtgtagtta 
ttttgttttt 
aatttgaggt 
gtcgcaaaga 
gcacactgtc 
gatttgtttg 
tcagaatttg 
agtagcccta 
cttctgcttg 
gggacagaca 
ttctttcttt 
tattttattt 
tggtttgtta 
cagtctaatt 
tttttatttt 
agtggcgcga 
cagcctcccg 
ttttagtgga 
gatctgcctg 
ccaggatttt 
agttctggct 
ttatagtttt 
tacgttgggc 
ttatatacac 
acacacaatt 
gggtcttgct 
tgacctctaa 
gcatgtgcca 
agtcttgctc 
caggttcatg 
accacacctg 
tgttagccag 
gtgcaacccg 
ctgggctcaa 
cactgcaccc 
ttgtgtttgc 



11460 
11520 
11580 
11640 
11700 
11760 
11820 
11880 
11940 
12000 
12060 
12120 
12180 
12240 
12300 
12360 
12420 
12480 
12540 
12600 
12660 
12720 
12780 
12840 
12900 
12960 
13020 
13080 
13140 
13200 
13260 
13320 
13380 
13440 
13500 
13560 
13620 
13680 
13740 
13800 
13860 
13920 
13980 
14040 
14100 
14160 
14220 
14280 
14340 
14400 
14460 
14520 
14580 
14640 
14700 
14760 
14820 
14880 
14940 
15000 
15060 
15120 
15180 



WO 01/14550 



PCT/IB00/01098 



ttttatttta 
tttttaaaat 
caggcttgtg 
taggtgacgc 
ttctatagga 
caagggctgc 
tcccattcag 
cagttgtgct 
caggctgtac 
gacttgcagc 
tgttatctcc 
tattttctgc 
ttgttttctc 
caaaatcgat 
ttgcttcagt 
gcaaatgtaa 
atgggtttgc 
taaggcaaaa 
cgtcagaaca 
aatttgatgt 
cttagggtgg 
agttgtgtgg 
gtgcatgccc 
caggtgtgga 
agagaggcaa 
ctctcctgcc 
ccttctgaca 
ccccagccaa 
atggttaaga 
aattcaattt 
atgtagtcag 
tacggcccat 
ggtccagtgt 
tgtttgcgtc 
ttagttcttg 
tagtcagctc 
tccccagttc 
gtgtactatt 
tgcattttgt 
atggttgtgt 
tttgttctaa 
aaaatatcat 
tggaggggag 
tggacaaaat 
agtggggatg 
ctggcccagc 
actaacccca 
tccaaaagca 
tggagtgcag 
ctcctgcctc 
ttttttgtgg 
ctgaccttgg 
ctgtgcccgg 
tggtttaact 
caggtaagag 
tgaacttaat 
gtctcgctct 
tgcctcagcc 
tttgtatttt 
atcttgtgat 
cgcccggccc 
cagatacatc 
taagcagtgt 



gggagtgtga 
agatttttaa 
gccagagcgt 
tggtgaggga 
tccttaaatg 
cactcccccc 
accccttccc 
gggtgtcctc 
tactttttat 
tgtgttggaa 
gtgtcagtga 
ttttttttcg 
tacctcatct 
ctctcataga 
tactgtagca 
ccccttttgt 
agcaacttca 
gaagagcctg 
gaaagaaagg 
ttaaccttga 
gctgcccgca 
gtgcccatct 
gggaaatggc 
ctgtcaggaa 
tgtgataatt 
gggcttatgc 
aaatggctta 
atccaccctg 
gaaaaaaaaa 
tagtatccat 
tggctgcctc 
gaagcctaag 
gctgcatgat 
ttccaaatgc 
gcctgtgctg 
ctttatacat 
acagttcaag 
aaaacttatt 
ctttctattg 
ttctagtaag 
caaattatga 
gggggaaaat 
agtgcagaaa 
taaaatagta 
gtcactggta 
ccctgactca 
gcctttctca 
aagttttctt 
tggtgcgatc 
agcctcccga 
ttttagtaga 
gatccgccct 
ccaattttta 
cctgacctca 
ccaccgtgcc 
cttttggcat 
gtcgcccagg 
tcctgagtag 
tagtagagac 
ccgcccgcct 
atacacttta 
ccactttata 
cctattggtg 



tgggttttcc 
aatggatgta 
cctcttctgt 

gggaggaggg 

tttttctctt 
agaactgatg 
tggagtgggt 
tttctgagac 
ggagtcttaa 
tttgattatt 
gaatgtaggt 
gggagggaat 
tcttatggag 
tctttatctc 
gcctcatgcc 
taccagtgga 
gttcttgctt 
acgcaagggt 
aaagtacact 
tcctgggatt 
tgcacagtgc 
gaagctttct 
ctctccctgg 
atggcctctc 
gttgagctat 
ctaactacct 
gagttcagtg 
ccatactgtt 
attcaaatgt 
aaataccgtt 
tgtactacag 
gcattcacta 
ggtccccttc 
tctaggcttc 
agaactctgc 
gttgcattgc 
ccctatttgt 
ttgttcattt 
taatgtggat 
tctaacattc 
acaaaggaaa 
gatttgaaaa 
cagagacctt 
gccatttctg 
gtgtgttctg 
ttctctatcc 
aggggctcat 
tttcttttct 
tcggcttact 
gtagctggga 
gacggggttt 
cctcggtctc 
tatttttagg 
ggtgatccgc 
tggcaaaagc 
ttttcttttc 
ctggagtgca 
ctgggactac 
ggggtttcac 
cggcctccca 
gtcaactttt 
tatataagaa 
attgaagttt 
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tcagctgaaa 
gtctattcta 
cagttctgct 
aggggctcgt 
agtttcttct 
tttttcagag 
gctgtgagct 
ttcttttacc 
gcgtattctc 
tttctactta 
catatgtgtc 
ggggtaagac 
tccattgaaa 
tgctgttaca 
tgtctgtttc 
aggtatccaa 
cctcaaaaga 
cagagcagga 
gggaagagtc 
tgtaggctcg 
gggaattgag 
tcccgtttct 
cgtctgcatt 
cctggctctg 
cacccaacat 
gtgatacttc 
gtctatagca 
tatttttttt 
ttactatttc 
ttattggaac 
ctgtagattt 
ctttcccctt 
ccttccattg 
cacgtcccca 
cccgtcttcc 
aaggtcgctt 
tattctctgt 
gactgcttta 
tccaagagca 
atctggctca 
atttagttaa 
aaactacatt 
tactagaagc 
tacctaatag 
ctgagagtta 
cctttctctc 
gcagacccca 
ggagactgag 
gcaacctccg 
ctaccggagc 
cactgtgtta 
ccaaagtgct 
agagacaggg 
ccaccttggc 
aaacttttaa 
tttttttttt 
gtggtgcaat 
aggcgcccgc 
cgtgttagcc 
aagtgctggg 
tattacaggt 
tattttgtag 
ttcatttctt 



tgatttgctt 
tttccattca 
gtcttgcata 
gtatctcgtt 
ttttttcact 
cctgcctgtc 
gtgtgggttc 
tgtgcttcat 
cccgactttc 
taggtcatct 
ttttatttaa 
tcagtatcag 
tggcttattg 
gtcgagacaa 
attttgtttc 
gttaccggca 
aagaattcca 
gcagaagttt 
ccaggcgggc 
cccttttccg 
cacaggcagc 
ccgccatttt 
cagttaacac 
gctgccaatt 
tcctagtggg 
aacacatgga 
gagaatggcc 
taatggccca 
atgatattta 
acaggcatgt 
ggatcctgtg 
tacagaagtt 
tcagctgctc 
aggctacact 
tgattctaaa 
tatcagaaga 
ctcagctcct 
tctgtctgtg 
gctacctgtc 
tagtagatgc 
gtggcgtaga 
ttaaaagtcg 
ttgaagtaaa 
ggcctctcag 
gggattctta 
tctctctatt 
taatacttgt 
tctcactctc 
cctcctgggt 
ccgccaccac 
gccaggatgg 
aggattacag 
tttcaccatg 
ctcccaaagt 
ggtcctcaga 
tttttttttt 
cttggctcac 
caccacgcct 
aggatggtct 
attactggca 
catttttttg 
tagctgtcca 
agttattttt 



tttctttgtt 
ttgcataggc 
gtttctttta 
tgcgttttgt 
gccattggtt 
ctagtcttgc 
tctgttgtgg 
gtaagttctc 
tgcatccata 
gaatttgcgc 
gtttcttttt 
ccagccatca 
atttttatct 
gtatcatgtc 
ttatacataa 
gcaaacacgt 
cggaggagca 
atttaaaagg 
atggaggtct 
cagttcttcc 
ttgtttagga 
gtctcttaat 
tttagcacaa 
tatcactttt 
tggtagaggc 
tcagctttat 
aactatcatc 
tgaggtaaga 
cattatatga 
tcatctgacg 
gcagagacct 
tgctgaccca 
ccctctccct 
ctttctgcct 
cccagttttg 
gcttcctctg 
tttttcctgt 
tatctaatca 
tgtcttattt 
tcagtaaata 
gatactagag 
tatagaaatg 
tggagatgca 
ctaaccctac 
ctctgctttg 
tctgcccacc 
aacttcgtta 
ttgcccaagc 
tcatgccatt 
gcccggctaa 
tctcgatctc 
gcgtgagcca 
ttggccaggc 
gctaggatta 
agctcaaaag 
ttgaaactga 
tgcattctcc 
ggctaatttt 
ccatctcctg 
tgagcccctg 
cctgtacatg 
gtaatttatg 
ttcaattaga 



15240 
15300 
15360 
15420 
15480 
15540 
15600 
15660 
15720 
15780 
15840 
15900 
15960 
16020 
16080 
16140 
16200 
16260 
16320 
16380 
16440 
16500 
16560 
16620 
16680 
16740 
16800 
16860 
16920 
16980 
17040 
17100 
17160 
17220 
17280 
17340 
17400 
17460 
17520 
17580 
17640 
17700 
17760 
17820 
17880 
17940 
18000 
18060 
18120 
18180 
18240 
18300 
18360 
18420 
18480 
18540 
18600 
18660 
18720 
18780 
18840 
18900 
18960 
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aatattacag 
aggtttaaat 
gatgctgcca 
aaaaaaatgc 
aatttgtaga 
attttcttca 
ctgtcctttc 
aatgtgaaca 
ccttctggga 
tggatgtaga 
ggcccccact 
agctaggtgg 
accgaagcca 
accacaacac 
actccattcc 
tgcagccatt 
accagtgata 
caatgtgtaa 
tctaggttcc 
gaagttcgct 
ttccttgtgt 
gagatgccat 
acttttcctg 
ttggcacaag 
gtagaatggg 
gtgaatcggc 
gctgctttga 
agagacatct 
tttacatttc 
tttggttgga 
gtgctaaagc 
tcttcatagt 
gtgcacctta 
cagaacacat 
caagcccttc 
atgatcttct 
ttccagcacg 
ggtgatttta 
gtccagatgc 
cggtgacatt 
aattaagggg 
ccacccgcct 
gcaggctcag 
ataaaatgga 
ccacaaccaa 
tcggcctccc 
aagattagtg 
ttgtgcctga 
ggtttcactt 
cagttgatgg 
acatttacat 
caaggtggtt 
ccaatagttg 
atttcatggt 
tgtccaatta 
cagggatgcc 
tattaacaca 
tgctcctagg 
tctctgaaat 
tggtacacct 
gttgagtcgt 
tactatggac 
aactgtatta 



cattgagctt 
ccccaaagtg 
aattatcttc 
atgtttccca 
agaaaaataa 
gcagagcata 
ttggtgaact 
ttaacttcat 
gtgttcattt 
aatttttacc 
ctaagttctg 
ggaatacggt 
tcaccccctg 
aggccgacat 
tatcctacca 
ccttagtctt 
ggaacgtccc 
cagtggatgt 
tccagtagct 
ctctgctggc 
gtcagggaca 
tcactttttt 
aattgactta 
agtagtataa 
agtcggtggt 
tctctgggga 
tttctgttgg 
cttgaaacac 
tgaagaaaaa 
cccctggtaa 
atagctgttg 
gagtgatgat 
catcttatca 
cttccacaac 
cttaaactgc 
ctgctcctcg 
gttggtgtga 
atactgtgta 
atctcagcca 
ttgcttgagt 
actttttaaa 
ccctttccca 
atagttggtt 
cccattttaa 
aatacagagc 
tgccccctcc 
tcaccaattc 
catcactgaa 
ccttttatag 
agattgggct 
ttgtgtcttt 
atggcactgt 
aaatttatcc 
tttttttttc 
aagtatacaa 
ttctgagaaa 
cacctgcatg 
cataaacctc 
acttaaacat 
acatagggcg 
tgagtgagtg 
tttatacacg 
actgatacta 



ctgtatgtat 
ggctgttcat 
tggaaggagt 
atacctttgc 
tagtttttat 
gagataagag 
gcctgccttt 
aaggtcacta 
taagatcaga 
ttgaagaaag 
ttgaagacaa 
tcacaggctt 
catttagtgt 
taggggctta 
gcagccagct 
tcactggctt 
tcagtttgga 
gtagccattt 
cctgagttaa 
agagcactgt 
cacgtgaagg 
acctcactaa 
ctgactgggc 
agaatgcagg 
tcctggccga 
gaataagctc 
ctctgaggca 
ttttcgtttt 
tagaaatcta 
acatgggttt 
gcatgcagaa 
agttacaccc 
ctctgaagga 
tgtataacct 
taacttaaag 
gttttctgtg 
ggcttctcat 
tttgataaca 
cccccctttt 
aagaagaagc 
aagaagcaca 
tcacttttat 
tgttttttta 
gtgtacattc 
attttcatca 
tcccacccca 
tggggcttca 
ggtgatgttt 
ctgagtagta 
agtttgccat 
gtatgctttc 
atattcccac 
atcttttgaa 
ccaacattgt 
ttcagtggtt 
cgaataggtg 
acatagctac 
tacagcatgt 
agaaaaggta 
tttactatga 
gtgagcgaat 
tcatacagtt 
taacgttttt 
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tacctttttg 
ttctgagagt 
gtactggttt 
taatgttgtg 
ttgcatttct 
ccaaactgac 
ccaatgcagt 
ggtgtcacta 
tggcaattga 
aatacataaa 
tttatcattt 
tgtccttgct 
ttgctggaaa 
ccacggctcc 
ccacttcccg 
ttgtgacttt 
acggtctgat 
aggactgggc 
cttcctacgg 
tgtgcagact 
atagcgtgct 
cacagtgccg 
ctagagaata 
attggcccag 
cctagcaggt 
atcacagcag 
gcagcaggtc 
gaccactaga 
actggaagct 
cagtgtagca 
cggcattacc 
aggtagatga 
tatgtggttg 
atgtcaggca 
aatacttact 
ttgtgcaata 
gagctgggtg 
cgattatcta 
gctgttccct 
tgctaaaaac 
gttaaaaaac 
tagatacagc 
aaatcagctt 
acggattttt 
ccccaaaatc 
gggcagccac 
gatcagtgga 
ttgcgatctg 
ttctattgta 
tttaggttat 
atttctcttg 
cagcagtgtt 
ttttagccat 
tttaagatct 
ttaaatatac 
attttgttgt 
tgcacaccta 
gcagttgtga 
gagtaaaaat 
attgagctcg 
gtgaaggcct 
aggttacact 
ttaaagacaa 



cgggtgataa 
ttacacattt 
ccattctcac 
agttatcagt 
ctgactttta 
ctgcattttt 
agctcatggt 
gaatcccatt 
taaaattctg 
gttgaaataa 
ttaaacaact 
aggctgagag 
cagggacata 
tttcctcccg 
cctcctcagc 
gacactgttt 
gtgtcctcct 
aaattactca 
ttatttagtg 
tctctgagtc 
tcgcggctgg 
tttacaaaaa 
agatactggt 
gtgaaggcat 
gtactgtggg 
ggcttcccga 
aaatagttgg 
tggtgggata 
tttttgtctg 
gctttaatgt 
agcagtaagt 
aattcaggga 
ggaagcttct 
cacgttttcc 
ggttttgtaa 
ggaggcaatg 
acctttgtcc 
gggtctcctc 
taggcataat 
ttctcatgct 
atttccttcc 
attctgctca 
tataaaaaca 
tgtgtatacc 
tccttcgtgt 
agatctggtt 
atcatccagc 
tccgtgttgt 
ggcatgtagc 
tatgaataaa 
ggtcattacc 
cctttcactc 
tcaagcagat 
aattcatatg 
agtcaggtat 
tgtggagaca 
ggctctgtgg 
aacagtggta 
atggtataaa 
ctagactgga 
aggacattac 
ggatgtatat 
ggtcttgctt 



atctttccat 
aaatatgata 
tggaattatc 
tcttttttct 
gtgagcttga 
tatgtcacgt 
ttccactgaa 
ctgttgggtt 
acatttcctt 
aggtcagctt 
gcaaactaac 
ttggttgctg 
ttcctgcata 
gaatcctcag 
cttctcaccc 
aaggtcactg 
aatatcacat 
actgctgggc 
ctagaccaca 
tcctgtgttc 
aatcttcaag 
agattaatgt 
gctgggcagt 
cgtcctaagg 
aagtgctgga 
ggagaacgtt 
ttctctgttt 
atgttatcat 
ttcagtagat 
gttaccacgt 
gccacttact 
gagcatctct 
cccaaaggaa 

tgggttgaat 

agtttggcaa 
gtagtggctt 
tgatgatggt 
tacgtctttc 
agtggtaaat 
taaaattggt 
tcgttctctt 
ccccattatt 
tttacataaa 
tgtgtcacca 
ccatttgctg 
tctgtcatta 
gtgtactatt 
ttgtagcagt 
ttggtgccac 
gttacaatgg 
caaacttttc 
cacgtcttca 
gtgtagtggt 
ctacacaatt 
tgcttgacga 
tcacagtgtg 
cacaacctgt 
agtatttgtg 
agataaaata 
agttgctgtg 
tactatacag 
tttttggagc 
tgtctcccag 



19020 
19080 
19140 
19200 
19260 
19320 
19380 
19440 
19500 
19560 
19620 
19680 
19740 
19800 
19860 
19920 
19980 
20040 
20100 
20160 
20220 
20280 
20340 
20400 
20460 
20520 
20580 
20640 
20700 
20760 
20820 
20880 
20940 
21000 
21060 
21120 
21180 
21240 
21300 
21360 
21420 
21480 
21540 
21600 
21660 
21720 
21780 
21840 
21900 
21960 
22020 
22080 
22140 
22200 
22260 
22320 
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22440 
22500 
22560 
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22680 
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gctggagtga agtggcacat ttatggctca ctgtagcctc aacctcctag gctcaagcaa 22800 

tcctcctgcc tcagcttcct gaggagctgg gactacaggc gtgtgccact atgcctgggt 22860 

aatttatttt tatttttatt tttgtagaga cggcattctt gctacgttgc ccccactagt 22920 

ctccaactcc tgacctcaaa cagtcctcct acctccgcct cccaaaatgt tgggattaca 22980 

catgggagtt attgcacccg gctcctccca taagtaaata atctatctct ctgttacttg 23040 

tggtgggagg aaaagaaaaa aaacacctag gttatgtata atacctaata cgagtacttc 23100 

gtaagtagtt attatactgt tttttttttt tgaaacggtg tcgctctgtc gcccaactgg 23160 

agtgcagtgg cgtgatctcg gctcactgca acctctgcct cccaggttca agcgattctc 23220 

ctgactcagc ctcctgagta gctggaatta caggcacgca ccaccacgcc cggctaattt 23280 

ttgcattttt agtagagacg ggtttcccca tgttagcctg gatggccttg aaccgctgac 23340 

ctcccgcctc aactcccaaa gtgctgagat tacaggtgtg agccaccacg cctcgcctat 23400 

actgtatttt tttttaattt ggccttacta tagctttttt acatgataaa ctttgtaatt 23460 

ttttaaattt ttttactctt ttgtaatgcc ttaaaataca ttgtacaaca gtataaaaat 23520 

accttatatc tttatcagct ttttctatgt tttaatttta atttttactt ttaaacttaa 23580 

aactaggaca caaagacaca cattagcctg ggcctacaca gggttaggaa catcagtatg 23640 

tcgctaggcg ataggaattt ttcagctcca ttataatctt atgtgatcac tgttgtgtat 23700 

gtggtctgtc attgaccaaa aggttgttat gcggcatata actggattca cagagttgtg 23760 

caaccgtcac cacaatttaa aaacattttc gtcacctcaa aatgaaactt gcacccctta 23820 

gccctatccc ctattctccc gccagccaag gcagcctcta gtagtctact ttctttctct 23880 

gtggattttc cttttctgga catttccaat aagcggaatc atatgatata cggccttcat 23940 

gtctggcttc tttctcttag cataatgttt tcaaggttca gcatgttgtc atctgtatta 24000 

gaatttcatt tctttttatg gtggaatcat gttccattgt atggacacgt gcgcacgcac 24060 

acacacacac acacacacac agaagaacta aatattacaa ggcttatcat gaaaaacaat 24120 

ggtctctttc ttgacccttt tcaccctcaa ttcctgttcc ccagaggcag ctcctttcac 24180 

acttgtggct gcttctgcag ataagctgtt cggtgacctc catattttaa atactgtggc 2424 0 

cgtattgctg tttcggtttt tcagtttcag gtattatcta gtgactttct gatagggaag 24300 

tgagaatttc gtttttaatc cgcccctctg agtgcacctc actcccacat acactcatct 24360 

gctgtttgca tggacacatt catgtgcagg ctctttccac tcttgattgc agtgtacatg 24420 

atacattttg gttaaatcgg tagtttatgt ttacatcatt atgactgtgg aagttgtgtg 244 80 

ttaggctgaa tctcagagtg aaccatgaat atatttcctt tcgtggaaaa ctttttgttt 24540 

tccctgagct tggcctggtg tcctttgagt ccagagcttc tcaggctcca cttatgtgaa 24600 

catggaccca gtgcccccat tggacacagg gtggcagtga gtgggcacag gcaaggagag 24660 

aaggagagtc gctccctctt ttcagccttc caccctctgc cctctgcact ttgccccctg 24720 

ccccacccca gactgctgtg gcttcacctg cgcctcctgc ccttgagggg ttctgagctc 24780 

caggttctga gctccagatg gactcctccc ccgccccagc tgccaggctt gggtttccct 24840 

tttttttttt atttgtttga tttcatttcc ccagacagct cttatctact ctttattttt 24900 

gttggtttat gtctttttgt tttcctttac tatcatttta ttggggtttt gggggtcaag 24960 

agaaaagcat gtgctaagtc caccagattt aaccagaggt caaaaacctt ccatttttat 25020 

tgtctaaata ttattcagtt aaggattccc cctccccatc ttagtcccca actgcctttg 25080 

ctgaatcttt agcgtctcct gccacagtta ttgcagtatt ccctgactgg cttcctcctc 25140 

ctggaccagt gatctgccca cgacccctcc ctcacacctg tccccatgcc ccagacccac 25200 

aggacagggt ccaagctcat tagcttagaa agtacaaccc ttggaatcac atgaattctt 25260 

tttttgttgc tagtctccta agttgcattc attcactcag tcatacaaat ggtgtatgtt 25320 

ttccccacaa tgtcaccctg tttgctgcac tgtgcttgag tctatgctct gcttccagat 25380 

ggaagatctg tgtcctccca catctgcctc cttgtcagag ttgagtctgg tgatcatctc 25440 

tgacctgaag ctttctctga accatactcg ttatgcaacc tgttgctgct tttctgcctg 25500 

gttgtacttc tcttgttaca attactgcac tgtgttcttt tttaaatttg tacatttttg 25560 

cagatttctc tgatgcctgg cttaatagaa gacagttgcc ttctcatatc tgcctctgca 25620 

ttcagtgtat tggggtggca catgtcgttt tgcttcggaa aattccactg cattgtatac 25680 

tgaggggata atgcgagatg agaaaggaaa atcacacgtt agtgttgtta taaagatagt 25740 

attgacttta cacaccctca gaagggggtc agggatgcca ggatgacatt cactacccta 25800 

gtgtcactta ccacattgca tagaccatac tgtgccgtac agaggcacat atttctgaaa 25860 

cttcctttat tcctaatata ttttgtagaa atttctatat cagtatggat atgtgttttt 25920 

tattgcagtg tactttattt tttcaaataa ctgttcgtgt gttagatgtt gaacggtgat 25980 

aggcctgtga gggatagttg gagaggtgac tagaggcctt ataaaaacac ttaaacagca 2604 0 

gatgagtgag aatatgctct aaacatggga gtgacagaag gtttttatct aggttgggaa 26100 

gaaatttaag attaatattt caggaatgta tgagtgaatt agaagaggag aaacaaatag 26160 

tagggcagga gatcatttag aaaatcataa ttatttagac ttgagtgaca gaatgctaag 26220 

aaggagataa gggtcacagg aatccagaga tacgaaggtg gacaggagaa atggcaggtg 26280 

tgtccacagg gcaggaggag gaggcttggc aatgcggagc attggttgca cacctgggcc 26340 

ttggggctga tcgtggtgtc tggacagaaa cacaaaaagg acaacccaat tttggaggaa 26400 

agagatgtcc tctgacttca atttctttac gtcccttcta cctctgaatt atctgtttta 26460 

tggcctgttt actattaaat gatccattta atagcattta cccttagctt tatgagtacc 26520 
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atgcactaat 
cctttacctg 
tttgatttat 
tgacctagct 
gtataataga 
ctggagcaca 
gcctaattaa 
attcgttttt 
cttaaatatg 
gattatttta 
ggccatggga 
acttcagaat 
aaatctcaaa 
tgctgtacaa 
gataacaaaa 
acctacacaa 
atgttgccca 
aattgggatt 
tggattattt 
ttttttttaa 
atccacagat 
tcatttgaga 
gttctaattg 
ttttcacatg 
atttcctgta 
gatatttctt 
gaacttcttc 
tgtgacatgc 
ctttgtaaaa 
gctttcaatt 
actcacacat 
gttgccttct 
ctctactcag 
ttttagtcac 
ctccaagtca 
taccagctcc 
acctgtaatc 
gagagcagcc 
aacctggtgg 
tgaacccagg 
cgacagaagg 
ctgaccaata 
tgcttgtaca 
tccaaatgta 
ggttctgtgc 
tcctcaagca 
ctggaacatt 
ttgtgtagat 
cttcccactt 
taacacactg 
gaccccaaaa 
ggagtgtctt 
cctccaggca 
gtccccaaac 
gtcatgcagg 
tcccaagaca 
atacaagctc 
tcatgttttt 
ttagccttga 
ttacgtgggt 
ccaaggaacc 
ataaaagtgt 
gggctaaaga 



aattttgaag 
cctcttgctt 
actgactttt 
atggatttct 
agcaaatact 
cattgatgaa 
aaaaaaagta 
attttttatt 
ctattgatga 
tcctatgact 
agactaaaag 
gaagtttgtt 
agtccgatgg 
atacagttgt 
tccactgatg 
atcctcccgt 
agctagtctc 
acaggcgtga 
aaaataccta 
tttgtgttaa 
ctggaacttg 
tagaattcac 
tgatttcttt 
agattctctg 
gactagagga 
cgttacattt 
agtcttacct 
taaaatcaag 
catttagtaa 
tgatgagtat 
gcatgcttga 
cttccccaat 
ttttctgtat 
tgtcagtttt 
gatatctaga 
accaatcata 
ccagcacttc 
tggccaacat 
tgcacacctg 
agatggaggt 
ctctgtctca 
ttggatgtta 
ctttaccact 
attgcaaatt 
ctctgtctcc 
tgccaatcgt 
cttcccccaa 
gttactttct 
gtgccaacta 
tagtaatgtg 
taacaatgtg 
cttgtatact 
aatccttcat 
ttcccggacc 
tactttagtg 
catacaattt 
catgatgaca 
tgtcaataaa 
gtagctaaaa 
agctagctac 
tagacagatg 
tataggtaaa 
cgggaagtca 



tatgctacaa 
ctgttatact 
gtatttgctg 
taaattgcta 
cattagacta 
tcattgttcc 
agtacatgat 
tctagaaaat 
tattgttctt 
tcttcgatag 
gtgccaagag 
atatctgtag 
acctacatcc 
ccttcattat 
ctcaagtccc 
atacctaaga 
aaactcctgg 
ccactgcacc 
atacaatgtg 
attttttttc 
cagatacaga 
attttaacca 
atctgagggt 
ttagagggag 
gaaataactg 
catctcagac 
gattataggt 
tgtttattgt 
acacagactt 
cctaggtggc 
cgtgaagggg 
taacaatatg 
tgcattatac 
tgctaagtat 
gataaaatta 
tacaagatga 
aggaggccaa 
ggcgaaaccc 
taatcccagc 
tgcagtgagc 
aaaaaaaaaa 
ctaatctttt 
ttaattttca 
aaacccgact 
cccctcaccc 
atttctgttt 
aatccttaca 
ctgtgagacc 
tgcaagtctc 
ggcagtttga 
tccaacctgc 
gctggccctg 
ctcactactc 
tgtggctata 
aaagcagtgc 
acttatttat 
aggatctttc 
tgagtgaata 
gaagttcttt 
ccatgtaagc 
acttaaattt 
atcgatgtga 
atgggcagtt 



43 

gtcaaaaatt 
taaataccag 
ttgtatttat 
atacatgtgc 
ccttaattta 
ctgcagctaa 
ttcaatgtag 
aaaacttcta 
ttcacatagc 
catttgtatg 
acaagcaaac 
tcaaaatacc 
aagtgtgcaa 
ccacagagga 
ttatataaaa 
aagaattttt 
ccccaagtga 
tggctcctcc 
aatgctttgt 
ttttgaatat 
gggcaaactg 
caaccttttc 
actttactct 
gatttgatga 
tgaacttcac 
aagccatagt 
gaacaagtgt 
aaaaacacat 
ctctttgatt 
atttcttcag 
ctctgctatc 
gttttgacca 
attgtgactg 
attacttaag 
ctgggtaaga 
cctatttctt 
ggcgggcaaa 
catctctact 
tactcaggag 
ccagatcatg 
aaaaaaaaaa 
taatttttcc 
tgtctaaaaa 
caaggcctta 
tgtctgctgc 
cagagccatt 
cgtgacccgt 
tatcctgcca 
ttttcatctg 
tggaatacag 
accgtacctg 
agtctatatg 
ataaacaatg 
aagcaccact 
aggccggttc 
tatgtttatt 
taggttgcaa 
actaacagag 
ctatgagact 
aaatttgggc 
ccctggggtc 
agttcagtat 
ccaagaacag 



gttgtgtaaa 
atagagatga 
tttttaaaag 
agatttagtg 
attatacaga 
tatgaatgaa 
ataatggcaa 
gaaatatatt 
atttttaagt 
aaatggaaaa 
atttaggtgc 
tgcattctgt 
agtcatttat 
tcagttccgg 
tgccatagta 
tgtagagaca 
ttctcctgcc 
catgtacttt 
aaatagttgt 
ttccaatcgc 
tagagttaaa 
ggctttctat 
gaaacatcac 
ctttctccaa 
atttcctgaa 
ttgcccatgc 
tcagcagtct 
cagtagtaca 
gccctccctc 
tacattacac 
ttatgtgtat 
tttcatgtcg 
tttttccggt 
ccacatattt 
atacatacac 
ggccagatac 
tcagttgagg 
aaaaatacaa 
gctgaggcag 
ccactgcact 
aaaacctatt 
taatctgaag 
ccttcccttt 
ttcttttggg 
ctgcctgccc 
gcgttatctg 
tttccagcct 
cccgtttata 
cagtgctgac 
ttgctagaga 
agaaccagga 
aaccagcccc 
ttgacaggcc 
gtctaattag 
caagcctgtt 
tgctgtcatt 
gaccagcgcc 
caagatcccc 
ggagcaaaag 
tggtagcttc 
ctataagaaa 
gtgtatttgt 
aaagtggggt 



aattgtactt 


26580 


ttttgggaag 


26640 


tctgttaaaa 


26700 


ctgtgtcaat 


26760 


tgcaggacag 


26820 


cacttatcaa 


26880 


ttaggaattt 


26940 


caagagttgt 


27000 


gaattacaga 


27060 


gcctgtggtt 


27120 


tttggtaatt 


27180 


ttagccagat 


27240 


taggaaaatc 


27300 


gacccccaca 


27360 


tttgcatgta 


27420 


gggtctttct 


27480 


tcaacctccc 


27540 


aagtaatctc 


27600 


tacactgtat 


27660 


gactggttga 


27720 


gacattgctt 


27780 


ttatgtaaaa 


27840 


agccagcttg 


27900 


actgaactac 


27960 


aatagtcaat 


28020 


agtgatagat 


28080 


ctggactccc 


28140 


tgcatatttt 


28200 


aatgtaagca 


28260 


acatgtacac 


28320 


catttggtga 


28380 


gtagctttga 


28440 


attcatgtac 


28500 


gagtttattt 


28560 


attttgattt 


28620 


agtggctcac 


28680 


ccaggagttt 


2 8.740 


aaattagccc 


28800 


gagaattgct 


28860 


ccagcctggg 


28920 


tcgtgatact 


28980 


cattaatgat 


29040 


ccttctcttt 


29100 


tccttgagat 


29160 


agcttgctgt 


29220 


tttcctctgt 


29280 


ccctatggct 


29340 


ccagcagttc 


29400 


tggctcctcc 


29460 


agattcacag 


29520 


agcgcaagat 


29580 


actggcagag 


29640 


agcacaatct 


29700 


tacattttgt 


29760 


gaaatgaacc 


29820 


ccttaccagc 


29880 


tgacataaag 


29940 


agtataggca 


30000 


aagttagcgt 


30060 


gcgctgaaaa 


30120 


gaagtcaggc 


30180 


gctgatggct 


30240 


gggtaaggct 


30300 
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gggaacgtga ggtgtgtttc aaaggaaaca tttcccctgt ctgaggatgg ttaagagtag 30360 

agttaaccca agaccttcct gtggatatca gcctggggtt tcatgtgttt gtgagtgtag 30420 

ttacagtttt tgggttttac tggctgattg gagttactgt gatttaatga cggtagggca 30480 

agcataatca tggttctttt ctttggtaat tataaaatag aaattgtttt attactgtgt 30540 

cgtggtcttg cagggaggat gacgtgagaa tagtgctacc aagcaggcag tgggcgtgct 30600 

gccaacccac atagagtcca agatcatgcc acttgttttg agaaaagaaa ggctttattg 30660 

caagttgcct ggcaaggaga caggaggaaa ctctcaaatc cgcctccctg aggtgggggc 30720 

tcaggcagtt tcataggcag agaaaacaaa gtgtgatctg attggatctt gcaatggggt 30780 

gatgctggga ggtgtcatct gactgggttg tgtcacaagg tgatgccagg gctcaatctg 30840 

attggatcat ggattatgcc atcaggtgtt tactccttaa tttggccccc gttccttggt 30900 

ctaagtgctt aggttctgcc cgtggttaca tgcttggttc acctgggcat gctcaagtga 30960 

cgtaacttgc aacttcaggg gccgtggcaa ttaaacagtt caccattttg atacacaaag 31020 

ttgaactaga ttgggctggt ttggtggtaa gaacagcaaa aaatcgaaag agactggcta 31080 

aaaactttca tggaaactaa gaatgctagg atcatgaaaa tgtctcacaa agcataatac 31140 

agagcctttt atacagtctt ttaaattctg tccattttct ttataactgc acaaaaaaat 31200 

aaatattgcc agttcacata cagtgcaaga aacacctctt ttagaatttt ttattactga 31260 

tgttataaaa ggtatcagaa atgtatgcga aagggctttt tctcctgcct taagcagttg 31320 

cagtacagca ttaatttttg tgttcttttt gcacagcgta aatgtatgca gcccaaagat 31380 

tttaatttta aaacaccaga aaatgataag agatttcaga agaaatttga gaaaatggct 31440 

aaagagctac aaaggcaaaa aacaaatcta ggtaagctaa gaaatataat acagttcttt 31500 

gcatttgtgt ccatacacct tgtttaattt gcatgatgac tagtggggtt cagcatgaga 31560 

gagctgatga agactatgat agctttactc tatgaaggag aaaacaaaat gtcaggagcc 31620 

tgcgggagac ttggctggga gccataatag agccacgcag cttgagctaa tcgaccacag 31680 

tcttaaccat tcatcaaggt ggtcgaactt tttattttcg ggaatgattt cagaagaaaa 3174 0 

gcaaactttg gctaataagc attattgaaa taaataccta tttatttctt ctttatatat 31800 

aactttgtat ttttacctaa ttggcatttt tgttttgtta ccctgaatag gcaaatctta 31860 

gatgatacat tattttagtg atttgggaaa atactttaga atattatgtt ctataacaag 31920 

atgtcttaga aaaaaatata tgtattctta tgtatatata ttgttaaata atatttttat 31980 

atataagaat attatgggct gggcacagtg gctcacgcct gtaatcccag cactttggga 32040 

ggcagaggcg ggcggatcac gaggtcagga gatagagacc atcctggcta acatgttgaa 32100 

accctgtctc tactaaaaat acaaaaaaat tagctgggag tggtggcagg cgcctgtagt 32160 

cccagctact tgggaggctg aggcaggaga atggggtgaa cctgggaggc agagcttgca 32220 

gtgagccgag actgcaccac tgcactccag cctgggcaac agagtgagac tccaactcaa 32280 

aaaaaaaaga atattatgaa acattaagat gctttgtacg tttttggtat ttctgttatg 32340 

cctttttcac tgtcgtctaa agtcagtatt tcctactaat tctgacacag cattgctaca 324 00 

gataagcaat tatggtcact agaaattcct aggaagcatt aattcctcta gtttttgttt 32460 

tctttgtttt aatctatgtt actatgtcac agattctcta ttctgtgttt tgaaattatt 32520 

caaatagaat tgtcgagatt tattttattt atttttttga gatggagtct ttctccatca 32580 

ccaggctgga gtgcagtggt gcgatcttgg ctcactacaa cctccacctc ccgggttcaa 32640 

gcaattctcc tggctcagcc tcccgagaag ctgggattat aggggcgtac caccacgccc 327 00 

agctgatttt tgtattttta gtagaaacag ggtttcacca tgttggccag gatgatctca 32760 

aactcttgac ctcgtgatct gcccgcttca gcctcccaaa gtgctgggat tacaggcgtg 32820 

accaccgcgc ccggccaaga tttattttaa atctgtgacg ataatgcgac agaactgggt 32880 

agaacactta gcccacatag tgctgccaca taattttcca gaaacatggc ctgcatcatt 32940 

tgtttcatgc tcagccctcc cgctgcctca cctggtgcgt gtccatcctt ccttcacacc 33000 

agctgtctcg tcttcgtcaa agctcaagcc agaaacgtgc aatcgtcctt gacatctcct 33060 

tcttcctgac actaaccccc atcaagacca tggccctgct tctgaaatag ttgtttgact 33120 

tcttctgttt tctccttccc tcctctctcc cctgatgcct ggatcatccc tcctgcacca 33180 

ctgcagccac tccttacgct gccctccact gtctccttac agttcatctc tgtgctgcag 3324 0 

tcacaatggt gaaaacttta aaccagaagg acatcccctc cctggtttaa aatttcctgg 33300 

tgtcatccca aggaaaaata ttcaggataa aatcctgtat ttatcatatc ctccaattta 33360 

ctaggtgctt tatgatctgg cctctctttc tagcctcata gcaatattgc acactctcct 33420 

ataattcttt atacttttgt cactttggcc ttctttccta tgtcagtgac agtgtatttg 33480 

aaaatacttt ggcaacatgg taatgataga tacaaaattt tcttcttaga ccaaatatgt 3354 0 

atcgtaatta aaaactatat gtataaagta ttaatgattc' aactaatgta catttgtata 33600 

ttgtcagaac tacagtaagg gtgattcagg cttaagagtc ccaaaggaga atatattaaa 33660 

tgattcttgg tatttttttg ttgggggtga gtatcaaagt tctgaagggc tctttgagca 33720 

tatgcaaggt agcattccag aaaaaaacac aactctgcac ccacacaaaa cgagctcata 33780 

acttcatggt tccgggacca tgctgatccc acttcatgca gtcaagttca tgtctgggtc 33840 

tgtgagtgtg tttgagggta ggagtgatgg ttaatggggg cagtttctga aacctgagac 33900 

aagaaacaga aactaaattg cattccagct ttacaacttt taacttctgt gtctcagtct 33960 

ttgtcttcaa gtggggatac tgatttgggt ttggatttga ggttggatgc actaatgcat 34020 

atattgttct tagcacagtg cttggtgagg gcagttgctc agcagatgtg agccagcagc 34080 
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gatcgtcaat 
ggaattagtg 
ttccacttag 
ttcttcacct 
ttcaaaagtg 
gcaagatggc 
tccttgtact 
atctttttat 
tagctaacat 
ttatctcatt 
cagtgaaaaa 
ggcaaaccca 
gtcacattgc 
gaataagata 
agacacfctgg 
aagtagaact 
gtgatttcta 
agcacgggac 
gatgcccacg 
ccagggagag 
ttggaaggat 
ggtacagagg 
tgagtaatcc 
cagtgtaata 
agggaatgtt 
tttagaatta 
tgtgcacatg 
atccccactg 
ctaaaatgta 
ctaatagctg 
ttaaaccact 
ggcttacact 
aagttggaag 
aaagcaaata 
aaatttctga 
gtagtcaccc 
tatcgtttgt 
aggagttcct 
aaatgcaaga 
tcggaaggca 
tcttcatatg 
cctcctgaat 
aaggagagaa 
acagttgaca 
ggtgaaggcc 
tgtagacagg 
gaggaccctg 
gaagaaatga 
atatcaaact 
tgtaacatgg 
atgtgaatct 
tttaaattta 
agaatgtgca 
gtggaacttc 
ttcttatagg 
ttttccttat 
agtattttaa 
ttaagccatc 
aatattgatt 
ttgmtcattt 
aaaggtttac 
aagtcaaaac 
ccaaaattga 



tgaggagata 
cagttagttc 
gatggtaatt 
agtccccagt 
cttgagatgt 
atcaagatat 
ctcacctctg 
tttacacttg 
tgatttcatt 
tattctccca 
ccaggacaca 
agcttctaac 
ttatcgggtt 
tattgagcat 
ggcacagttg 
tcagtgattt 
caaaacacag 
tagagacttc 
taaccatggg 
ctggcaaggg 
gaattgtatt 
agtatagaat 
atataagagt 
cgaggacttt 
tcatgcaagt 
ttgatctcct 
catatggaaa 
ctatcccata 
cccttaagtg 
tactttaaaa 
tttaaaagtt 
catcttttga 
gatccattaa 
atattcattc 
gtaatctttc 
ctsaccaaaa 
ctcctacctt 
cagtaaagag 
gaaagaggag 
ggctgcagca 
atgactattt 
ctcagctgcc 
caagcatatt 
ttaccaattt 
gtgcaacttc 
ctgggaaaga 
ctcttccaaa 
aagaagcggt 
cctctgaagg 
agacgtctac 
ccttttccaa 
tcacaacttt 
tttgatcatt 
tagacttatt 
tcaaaaggat 
tccattggag 
aacctaggtt 
gattgtatca 
tttgaagaaa 
gtaacaagcc 
tcctattcat 
aataaaaaat 
agaaatcttg 



aatagaggat 
catgttacgc 
tttcatggtc 

ggggctcaag 

tatggaaaat 

tagaagtttc 

tgtgctggaa 

ttacatttat 

tttgttgttg 

ttagccttaa 

gaggtcaaac 

ttaggcagtc 

tttgaaaagt 

ctaaaattaa 

gaagggaagc 

cagacagaga 

gccctccacc 

ttcatattca 

gcagtatgat 

cagtgggagg 

tacccagaat 

atttaggagg 

tgaaacatta 

agtggaagac 

ctagaacttt 

ggaaaattta 

tttgcctaaa 

cacccatcaa 

gaaatgagaa 

gttgttttat 

agctctccgt 

tgatctttgt 

tgacattaaa 

atcaccatct 

aaaggaagaa 

gcaggctgca 

atcttcaaca 

aaaaagagta 

caccaggaga 

cgtggcggga 

ttcacctgat 

atcaagccct 

tgaaatgtct 

cacagcaaaa 

gagttgcgtg 

agacgcatgc 

aggacatgat 

tggtctgaaa 

cgaagcccag 

agaagagaag 

gtcaccttcg 

ttcataactt 

ccaatgataa 

ggttaagtct 

aagtagtcca 

tcttggcgct 

cttaatagtg 

aagagaaagt 

ctttggtcag 

ttgtttaact 

aatttaattg 

tttgcacttc 

aactttgacc 
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ttctaattag 

acatacatgt 

atatcttctt 

taagtagcag 

ttatcatgaa 

aaacaaagcc 

ttatttaccc 

ttcctagaag 

taggcactcc 

gaggtaggtt 

agcttgtcca 

tgacatcaca 

gtgataaaac 

aagtgacctt 

tttggtggtt 

gttctaacac 

agcaagtgct 

ttccagtagc 

gcatgatggt 

gtcccaggga 

aaagtgtgga 

tagcagcagc 

aagcctacca 

agggaaggta 

ccctagatct 

gtgacaaact 

atttttagaa 

agcccaggtt 

gaactcaagt 

tggtcaactg 

taatgttttc 

ggaaactcag 

agtgatgtgt 

ttcactcacc 

ataaacttgc 

ggtatgtctc 

aaaggccacc 

tcacatggct 

tctatcatgc 

cctgccctgg 

aatcttaagg 

gctcagttga 

gatttttcct 

accatctcca 

acttctgccc 

ccagagggaa 

gatgatttaa 

agcacacaga 

agtgaacatg 

gaaaacttac 

ctaaataaac 

atttccccat 

actctttagg 

gttaggaatc 

tagtatgaat 

gcagcgtgtg 

aggctattta 

gtgaaaaact 

tgttaactat 

tgtacttatt 

taattataat 

acagttataa 

gtctttacct 



aagcagaaag 

ttgtaatgtg 

cgtaccaaat 

tgatccctga 

agccacagca 

tcctttcagc 

atttctctta 

ttggaaacaa 

tctaagtgtc 

ccatcaccat 

aggtcatgtg 

gattacactc 

ataaaacaat 

atttccaatt 

acctgtgttc 

ttacgtgacc 

gagcccctat 

ttatagcaca 

gtgtagcaga 

tgttgacaac 

ggaaagggga 

ttagcattac 

aatggctcac 

agggtgagct 

tacaacagta 

atggatgctc 

gtttgttaca 

ctctagttaa 

gtggttaata 

aaagttgaat 

cagatgaata 

gatgtggaaa 

gtatttcttc 

tcgataaatc 

aaakaaatat 

aggagacgtt 

ttttgataca 

cccattcacc 

cgaggctgca 

aggctcttag 

aaaggtattc 

gctgcagaag 

gcgttggcaa 

gtcctcggaa 

ctgaagaagc 

atggcttttc 

ctcctttgga 

acaaaggtac 

agccatgttt 

ccggaggata 

atgtaacagt 

ttactcctct 

aatagatgac 

tatttctcca 

aactgagggg 

taaagatgta 

aagaaagaaa 

acttttagaa 

gaagmaacat 

ttgcttgaag 

aaaccatatc 

gcacaaatag 

aaagattagg 



aacactggta 
ggagccctag 
ttcttacagt 
aagtactatg 
atgacaaagc 
gcagggttaa 
aacagtctcc 
gtgataataa 
ttattcactg 
cccattttgc 
gtttgtgaat 
ttagtgacat 
tttagatgct 
actgccttga 
ttccttttta 
tccagattga 
tgagggagcc 
gtgacgggca 

gggggcaagg 

ccaggtgggt 
aggcccagag 
tctcaggaaa 
ttttgaatat 
gtgttcattg 
gttcttaggt 
ttttggaaaa 
cctcttctct 
aaatactggc 
gtcttcttaa 
atagaataat 
ctttgctggt 
tcaggaaagg 
acttgtattg 
aagtcctcag 
tgcaggtaaa 
tgaagagaag 
ttcaagaccc 
tccgaaggaa 
gctgtgcagg 
ctgtggggag 
agagaatctt 
tctttctaag 
aaaaaccaga 
aactggaaat 
cctaaggtgt 
ttacaccatt 
aggaagcctt 
cacttccaaa 
tatagttgac 
cagtggaagt 
gcatccatat 
ttttacttaa 
ttgctgtctt 
agacttttcc 
agtgaagtct 
tacgatagag 
ttaaggtaga 
atctgttgtc 
ttaaacattt 
catcacttga 
attttattaa 
gttccagcaa 
ttaaaatttg 



37920 
37980 
38040 
38100 
38160 
38220 
38280 
38340 
38400 
38460 
38520 
38580 
38640 
38700 
38760 
38820 
38880 
38940 
39000 
39060 
39120 
39180 
39240 
39300 
39360 
39420 
39480 
39540 
39600 
39660 
39720 
39780 
39840 
39900 
39960 
40020 
40080 
40140 
40200 
40260 
40320 
40380 
40440 
40500 
40560 
40620 
40680 
40740 
40800 
40860 
40920 
40980 
41040 
41100 
41160 
41220 
41280 
41340 
41400 
41460 
41520 
41580 
41640 
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agtgagaatg 
tgaaggtgga 
cttgagtata 
ggcctgctat 
ttatgttcca 
aagttaatat 
aggtttacat 
taatgacatg 
ctaaaatccc 
gaacactgcc 
ggccagcact 
gtgcaggcca 
ggatcttcta 
taggacataa 
gattcatcaa 
tctttgggaa 
gcacactcta 
ctccaaatct 
tacttgttgc 
caaagagctt 
ttgttttgtt 
gcagtggcac 
ctccgcctcc 
gtatttttag 
catgatctac 
ggccaacaat 
ttattctctt 
agaagactag 
tgaaagacat 
gaagcagcgt 
ctagcaagca 
acgaaggctt 
gtgactctgt 
gaggaaaatg 
ggaaggaggt 
gggcagctgt 
aaccaaatct 
ggagtgtttg 
tcttgtgtgt 
atgcatagaa 
acatattggt 
tttttctgta 
gaggtatgta 
ttttattgac 
aataatttgt 
ccatcaccaa 
cgtatttcac 
ccacagttag 
cttagattct 
tgcaattgaa 
aaattccggc 
aggtggatca 
ttctactaaa 
tgggaggctg 
atcatgccac 
aaaaaaaaga 
gtttgggctc 
taacctctga 
ccttggcact 
ttaaagccga 
aggagacaca 
ctatacccca 
ttttacatgt 



cattctctct 
gaagtcatgg 
cgctccccaa 
ttatatcttg 
ttagccttag 
aaaacaatag 
tctccatctt 
gaaatgtaaa 
cgtgttgttg 
cccctgcgtc 
agttgactca 
tcaggcaggg 
cagtcacctg 
aaagacctac 
tgttttaaac 
tctatcttag 
tatctggtga 
ttaatgtcga 
tgttgtcagt 
caatacgctg 
ttttagtttt 
aatctcagct 
cgagtaactg 
tagagatggg 
cctcctcggc 
tttgttttct 
aattttataa 
aatttccatc 
gcacccgttg 
gcaccctcat 
gttaatgata 
ccctgagaag 
caaaatccat 
cttgcaggaa 
tggagagggg 
tttgttttga 
gatttaggtt 
tctgcacgaa 
gtgtctacaa 
ttatattgcc 
accagtaaat 
acagatcaga 
ggtactcatg 
actggaatac 
ggtgggggaa 
cattgattgc 
ttttgaaaat 
ataatttgca 
tctctcgatg 
aaataggtgg 
caggtgtgat 
cttgaggtca 
attacaaaat 
aggcaggaga 
tgcactccat 
aataggaaat 
aggaagtatg 
cagcacaaga 
gccaatcaca 
ctggtcatcc 
gttgtcctat 
cttgtccatg 
cagcaccttg 



gcatgatttc 
tagcgtttga 
aattgtttca 
gctttctgaa 
tatgtgtttt 
atgtgtaaaa 
tccagttttc 
ttaagtagga 
ggagtgtgct 
gacagcctcc 
aggcaccctg 
cccgggtgca 
ccttcatgcc 
atgttggcta 
tgtcccctgt 
aagatgaaac 
aattatggag 
aataaagggc 
atctaagata 
atagaacggg 
ttttgagaca 
cactgcaacc 
ggattacagg 
gtttcacgtc 
ctcccaaagt 
aaaatcttta 
acagtacaca 
tcctcacttg 
tgtctcaggt 
agaggaagac 
aaaagaaaaa 
gtgacatccg 
gtttcagctg 
caacggggag 
taagagccag 
agtgtgatga 
ttaaatgtaa 
gcagccactg 
gcatacaact 
ttaaaaattg 
cttcattttc 
tagtaagttt 
agatgattac 
attttttttt 
taataacatt 
aaatgtttat 
atcttttcac 
ttgagcatct 
cctgcctctt 
aagctcctca 
ggctcacgcc 
ggagttcaag 
tagccaggcg 
atcacttgaa 
cctggacaac 
tccctttgct 
tccacagcca 
gagaatcgtt 
gctcttcaac 
tgaactagtg 
aggtgccgtg 
ggtttgatac 
tacatacgtg 
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tctgctctac 
aatcatcaca 
caaaaagaat 
catatattaa 
caaaatattt 
ttctttggta 
accttgtgta 
aaaagctggt 
agtcctcgga 
ggggttgggg 
gtggggacgg 
agaaaacatt 
attacagaca 
gcctaaatcg 
cagccccctg 
cataaagcct 
gggtgaaaac 
tttttgccat 
cagtgtaaaa 
aactgagcga 
tagtctcgct 
tccgcctccc 
cacccatcac 
ttggccaggc 
gttgggatta 
aaatcattaa 
gatacattcc 
cctcttttca 
gctcttcaag 
aaatagtaaa 
caatactgca 
atcacaggcc 
gaggtaacag 
gccagtacag 
agctttggga 
gaagccattg 
ccatggacac 
tccacagttt 
gcaaatagat 
ctttttacaa 
taatagagcc 
attacgcttc 
ataatgagaa 
gtaatacagg 
tcatttaatt 
taaggctgat 
acagacagat 
tcattgctta 
agcgtgtcct 
actgtgcagt 
tgtaatctca 
accaacctga 
tggtggcaca 
cccaggagac 
gagagtgaaa 
cttgcactca 
gtttgcatgg 
gcttatttgt 
caccgaaagt 
cacagcttgg 
tgttcactag 
agattttctt 
ttgtgagctt 



aaatgtttta 
gacatgttac 
gaaaataatt 
atttgacaag 
attttaaaat 
gttaagaata 
ttttttaaac 
agcaaacagt 
agcaggtgtg 
gtaagtagaa 
agaggttttt 
ctgtgtgcgc 
gacgggaagt 
aacccttttg 
ggactcaggt 
tcagtttcag 
ttctgtacag 
ttctgttttc 
aaggcttcaa 
gaaacaattt 
cttgacgccc 
gagttcaagc 
catgcccagc 
tggtcttgaa 
caggcgtgag 
tttttttctt 
cattgtaaca 
ctaattcact 
tttgtgggga 
taagtgtata 
ttggatagat 
tggggaggga 
caggaacaaa 
caaggacttc 
ccttcagtct 
gggcttttga 
tgaagaacag 
aatatttcct 
attttaagag 
aagcagtatg 
tataggtagg 
atgggcaaag 
aaagacattt 
tctattaatg 
ggagttcaga 
ttgtaataag 
actcctgatt 
gaagacgctg 
tatattgcag 
taaatgggtt 
gcactttggg 
ccaacatagt 
tgcctataat 
agaggttgcg 
ctccgtctca 
gtctgaaaag 
gaatggagat 
ggaaatgcgt 
cagtttgaat 
cttctagttg 
caaaagcaga 
ctttgctgtg 
atctgagcaa 



actgcctctt 
ataccttttc 
ttatgttttt 
aaactgtatt 
gttgactcaa 
tcctgttctg 
ttttgaataa 
gtggcatggc 
ttatgttcta 
gsgggtgagg 
tcgctcagtg 
tagtgcgaga 
cactgggttc 
tagtaataaa 
gaaccaactc 
tgtcagggat 
caaactgtac 
agttcacttt 
aaacaagtta 
tggttttgtt 
aggttggagt 
aagtctctgc 
taattttgtt 
ctcgcgacct 
taccgcgccc 
ttttactttt 
aagattgcta 
tcctaactaa 
catagagaat 
acaatgtcag 
aaggtgacca 
gagggagcct 
tgtcctgatg 
ctgagctgca 
ctgacaaggc 
acaggggaac 
actgtgggtt 
tccacacatt 
aattttttgc 
tcatatattt 
gtcagcacac 
agaccaaatc 
tccacaaaat 
agaaaaataa 
ctgagtgttc 
atagatttta 
cgatgtcagt 
atggaattct 
attcatcact 
ttgaaatagg 
aggccgaggt 
gaaaccccat 
ctcagctact 
gtgagccgag 
aaaagaaaaa 
tgctgctgta 
cttcttgttt 
cccacctgac 
tgccaagtag 
cttttcacga 
aaagtccttc 
tgtgatggat 
tttggtcatg 



41700 
41760 
41820 
41880 
41940 
42000 
42060 
42120 
42180 
42240 
42300 
42360 
42420 
42480 
42540 
42600 
42660 
42720 
42780 
42840 
42900 
42960 
43020 
43080 
43140 
43200 
43260 
43320 
43380 
43440 
43500 
43560 
43620 
43680 
43740 
43800 
43860 
43920 
43980 
44040 
44100 
44160 
44220 
44280 
44340 
44400 
44460 
44520 
44580 
44640 
44700 
44760 
44820 
44880 
44940 
45000 
45060 
45120 
45180 
45240 
45300 
45360 
45420 
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tccaactacc agggtcttgt tcatcgataa tagtcaccag ttgttggagg tcaatgatgg 45480 

ttaactactc ttctaccttc tatctaccag atcttgttga aggcaagtat cagaaagaca 45540 

tttattaaac atttattggc aagcagttag gaagtggtcc acaaattgac caatatgctg 4 5600 

aaggcccagt tctctgtcct ttagtgcagt gtccatactt tatctgaaag gtttgctgga 45660 

ggcagacaac attctatggg caagtttctg caaacttgca ctcagcacca gaccatcgtg 45720 

tatcctttga ccctgtggtt tattataggg tcatttaggg attaagcctt ggataccacc 45780 

tccagggata ccagccacaa ctcatactag atggttatgc tctgttctgt gtgggtattg 45840 

ggttcccctg caatatttaa gccaattcag tgttcttgaa tccatgaatt taaccaataa 45900 

gaaactgttt ctcacatcca ttatgctgat taacaagctg atgatgtcac caataaccac 45960 

tcatttttgt catccatttt ggcttttaac aaagcatcta atattgggct ggaggattta 4 6020 

caggagttgg ggttttttgt tgttgttgtt ttgagatagc gtctcactct gtcacccaaa 46080 

ttggagtgca gtgacatgat cgcagctcaa tgcagcctca acttactggg ctcaagtgat 46140 

cctcccacct cagcatcctg agtagctggg actacagacg caggccacca cactcggcta 46200 

cgttccccag gctggtctcc aacttctgag ctcatgcaat ctgcccgcct ctgcctccca 46260 

aagtgctggg attacagttg tgagccactg tgcccagcct atggtatagt acattttgca 46320 

aattctgagc attcaagagg aactgtgaat tactattgtt gcaaataaat agatagacat 46380 

atattcatta agtatgttaa attgttgcac ttttgactct tcaaataatt cacaagtgta 46440 

ttaagaaccc cctttcccat agcctgccag cctaactcac tggggctgca aaactaagca 46500 

atcctagcaa cttgatgtgg gttagtcagt cttaacagaa ggctattgac cacttaactg 46560 

tttggttgat tcattcattc atttacatat tcatttttta tctgtcagat gtttactccg 46620 

tatctactat gtccaatgta taaacagtga gagaggtaag gttaatagaa agctctgtcc 46680 

cttgctttaa agaacttagc taagtaggga aggtacagtc aagatagttt acacacaagt 46740 

atcaggaaat tcaaaagtca gagcaattac tttcagtggg aattaaaatt gatattggaa 46800 

tgacctctac aacgattaca aaggataaaa ttccgcatta tctattgaag agtgtttttg 46860 

tttttttcag aatgaacaaa gtgaacttga tattttaata gatgaatatg aatacagtct 46920 

cgttagcaga gttttacttg tgtagaaccc gtataacttg catatatacc aaaggtatct 46980 

ctggaaagga atttttccta ggtgtctttt aagattcttt ccagtcttaa tattttgcat 47040 

actacattgt aaaataattt catattcaaa tttttgaagc ttagaagaca tttctcattg 47100 

gataatgtta agtgtatatt tttacatgtt aaaattatgg attattcagc cttcagaagc 47160 

cttttcaacc cttgactctt gcatagtgca ttgtaagagt aaatactaat tgtttaaatg 47220 

tgttattaat attagcattg ttagtcttaa ttctgtatct tggaagtagg aaagtaggat 47280 

gtggaggaaa ataaatgtta aaaataagag ttatttcttc ggccttagct ctagacaaaa 4 7340 

tttgacacaa gccaagtttc tcctacagtc ttttcatcgt ccacttcttc atctctccct 47400 

ttcctagtat ttaagttaca tgtgtcctta tactgtcttg ccctggatct ggctccaaag 47460 

tgatcatatt agtcattttc ttctcttttc cctcagtatc aatacttttc cttaatcttg 47520 

cttatctctg ttgagtagct gaaggttgtg atttaactaa ttcacactga gaggtgagtg 47580 

agtgatcatt tactagcttt cattgatgtg tttgcatttt gatggtatta ttaatccaaa 4764 0 

ctaatttcca aatggtgaaa tttcagataa ctgaaagata aaaatgtggg gtctgtcaga 47700 

ttcatttccg tatttgatca tttcgtgaaa acgaagtcaa tgaattgtgt gtgtaatgag 47760 

gttgggagga aaatgagagg aagatatatg gctttcacag ggaaatgctg tggaccaaat 47820 

tgtgtccttt gacccccaca tttatttact gaaggtctaa ccctcaatgg gataacattt 47880 

ggatagggtg atctttggaa gataattagg tttagatgag gtcttgaaga tgggggcttc 47940 

atgatgagat taggaccatt ataaaaagac cagagaactg gcttcctctc tctctgccat 48000 

gtgaagacag caagaaggta gcctccttca agccaggaag aaagccttca ccggaacccg 48060 

accatggggg caccgtgatc tcggccttca ggccaccaaa tctgtggtat tttgttatgg 48120 

tagccccagc cgaagaagac agacattcat ccaactgggg tgtgttggag gaagagcagc 48180 

taaagggtgc atgttcgttg gaatttcttg gagacattca aaatagatgt ccattaggta 48240 

gttggatata gccagccata cctcagctgg gaggtctaga caaggtacag agaattaggt 48300 

ctcttcagta atggacgact ttatgggaag tgatgaaatc accttgggga gtgagaaggg 48360 

agctgatgac aacccatgaa aaaaccacac ttaggagcaa acacgaataa agagtcatcc 48420 

aagaagtggg agagtcagga agaggagggt aggtgtttgt ttacagacct cctgccaaaa 48480 

gtggagtcca actaatcttt ccacagatgt tttcagaagt actttgcact ctcaactgct 48540 

ttgggtttac cgatgtcaat gttaaaaccc actggcaaat tagtgtggca gagtttatga 48600 

aatgttttaa ataaacaaat catttactta gatcattttt tgacttcagg atttgtgaaa 48660 

ttgtgaaaac atgttaacaa tatcagtctt tttttttttt taatatcagt ctttcttaag 48720 

ttttaaaaga ttgtgttgca tttcttagaa ctttatgttt ataaaatgct ttacagcctg 48780 

tttcgttgtt cggcaagaac tgaggcaagt ggctattata aaacttttat tgaatacact 48840 

aggaagctgc aaatttattc atgactcaat aacagagcac tacgtcccaa attatatctc 4 8900 

tagtccactg cttttccgat tttgacacac tcatgcttca agtaaatatt tgttatttaa 48960 

aaaggaaaat aagtgcgtag tagatataat taataattct aattattttt aatcttaaag 49020 

acgataggag attgcattca tgttctaccc cgggggataa agtgggcctg ggagaaaagt 49080 

cagtgcaagt caaccataaa agatacctga ggaggtacgg gatcagtcag gatgtgactg 49140 

gtttgagtct cgagtggatt cagtattagg gattatggca aagagtgtag gttggtaggt 49200 
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ttgtggttta 
cttgaattca 
ctgtttttga 
tttatagcat 
ggcactgatg 
gggttgtgag 
gatgattagc 
tggcaggttc 
ccattgagac 
aagtaggcaa 
gtagatgagt 
gaggttagca 
cttcatggca 
atttgtaggt 
gaccccaaaa 
ttctttgtca 
atacattttc 
tgttaccgac 
ttctagatag 
tgtttccatt 
agttctaact 
tgatgtttta 
gaagaaaagt 
tgctgataca 
tgtcttttat 
gaaagcgttg 
cactttgcta 
gttgtgccag 
aagaatccaa 
tgaaaggttt 
ccttgctctt 
ttaaatgttt 
tattatacct 
ttcagaatgg 
gtagatgggt 
tgcatttgac 
aagacattta 
ttctttattg 
gaaccctatt 
catgtttgtg 
ctccactgca 
agaacgctcg 
cctactgcta 
tttgcatcta 
tgttctgtgc 
ccgtggtacc 
ttaagtgctc 
tcctgtctga 
gtttcagacg 
gtctgtggaa 
ttatgaggca 
tggcattctt 
ttgagtaatg 
cacacctcca 
aactgcctaa 
actttgccct 
gtgaagaatc 
atgcctataa 
agcctggcca 
acaaaaattg 
gtgggagaat 
cactccagcc 
tttctggccc 



gaactggacc 
ggaaagtatt 
aaagagtgac 
tttatctgca 
ttaatttcac 
ctccctaagt 
tatttgttaa 
tagtaccccc 
catgttatgg 
actgtttgaa 
caaatcaaag 
tctaaggaaa 
agacgctggt 
tgaagcctga 
cttagattaa 
ttattttaaa 
ttacattatc 
cattgaatta 
atctcaaaga 
atgcataaat 
aatttttaat 
gatgactcat 
gggagaggca 
tcataatgtt 
tttaaaaaat 
tttgcattcc 
atgcatttct 
tgtttcttgc 
acattcattc 
atatactaac 
cctctgaaag 
ttatcattca 
tgtgcctcat 
tagggaaaaa 
ctgcatgggg 
ctgcggaaga 
aaattacaca 
atgctttctg 
tcaggtaaag 
gcgtgggcaa 
gcctgcggga 
caacactgtg 
catggccgct 
ctaccgttcc 
cctcccgctt 
tctcatctct 
agtaaatatt 
tcaaaaggca 
agtggtattg 
gcactgggtt 
tctcactgta 
ctgtagatca 
ataaaagaat 
tatctgcctt 
cttgctcaaa 
cataatttag 
actcaaagca 
tcccggtact 
acgtggcaaa 
gccaggcctg 
cacctgaacc 
tgggcgacag 
cacctgagac 



ttaaaatctg 
aacattttta 
agtttttacc 
aagaagtctt 
gttgcattat 
gtgggactat 
tcattaggta 
taggctgcta 
tccatgttag 
ggaggaggaa 
tacacatttc 
ccacatttca 
ctaaggtgga 
agctctgtac 
ttattgcctc 
atctaagaat 
aaatggacgg 
agtgaattgc 
gccagatata 
gtgatttttg 
ccccttgggt 
gtgacggctt 
aaaaggtcag 
cttctctggg 
tgattatggg 
agacaaagag 
tataaaaatg 
cattttataa 
acttttgaac 
actcaggtac 
actccgctga 
atatctactt 
ttttatgagg 
ccataccgat 
cttccaggtc 
gaaacctgac 
gatttcatga 
gtttgtctcg 
caacagatgt 
caccaaggca 
cctgctcagc 
gctgagtttg 
tgacctagtt 
cgtgcctctc 
tactgaccct 
attggtactc 
gagtgttgag 
tgaggtttga 
ccttcctgtg 
agggacaggc 
aatggcatat 
aatagtaagt 
gagtggtcag 
caactgctgt 
tttaagtctt 
aatgtcattt 
ggttttaaat 
ttggggcggg 
accctggcca 
gtggtgggca 
caggaagggg 
agtgagactc 
cctcctggcc 
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tccagggccc 
ttctacatcc 
ttagactctc 
tctcatgtta 
atttattcat 
atcttgtgca 
atcaacagtg 
cataactttt 
ctcctccttc 
gggtgagagt 
acatttcatc 
cttgaatgag 
aacttggggg 
tgaagactat 
taatatggaa 
ttaagtttga 
agttgatgct 
ttgggatatt 
tacaatttat 
ttttgcttaa 
tttaggtgtt 
taaggacctc 
tgtgtaaaaa 
tcaatgaaac 
taaatgctgg 
ttattattga 
cctgtagctt 
tcggaataaa 
taaccaagtc 
acccttcact 
tcctcttaca 
gcattgctta 
ccaaaaaagt 
tgaaaagcaa 
ctgatacgca 
tcctttgctt 
aagttggcag 
gaaaaaaaag 
ggagagagag 
cctgctctac 
gcgctgcctc 
tgttttgcgt 
catttggaaa 
ctgctgatgc 
ttaccctctg 
tacgttgtac 
ttacttgtta 
ctttctcatt 
cctgggatag 
atcctgggcg 
gaatgggaga 
gctccataaa 
agagggagac 
gctcaggaac 
cttttaaaaa 
ctgaaacgaa 
gcagattttc 
cggatcactt 
acatggcaaa 
cctgtaatcc 
aggttgcagt 
tgtctcaaaa 
agcagctccc 



aggctgcaaa 
tttttcactg 
caaacttagt 
tatgattttt 
ctgcatctac 
ttttgcatct 
cagtttggct 
gcgtcaaagt 
aaaatcccat 
aagaggcacc 
gtgggttact 
tatccttttg 
gagtaaaatc 
tttctagaaa 
ctgcctactc 
cgagtgcgta 
gtagaacact 
ggaatgtaat 
ttaaaaggcc 
gttgtatttg 
aaaaatagac 
atcaaacctc 
tattatttta 
ataaaccagt 
aaaactcaga 
tagagcaagc 
ctctcaagca 
tatttactag 
ttgacctcaa 
ttgtggtttt 
tgagtaatag 
aatttaaaat 
ataatgtagt 
cagatgaaaa 
ggcttgaaca 
cttatcttgg 
taacttgtag 
tggagcaaga 
agactgtcag 
aatggcgttg 
ccaggggtgg 
cccagtttct 
gaaataaaga 
gtcgcatggc 
ccagtgtctg 
catgtctggc 
ctcaccataa 
tgcccacagt 
ccctgaatct 
ggagtgtggc 
tgggtacctg 
tataaggtgg 
aaaatacaca 
aaaaatattt 
tattttaaga 
tccaccactt 
tgggccagtc 
gaggtcagga 
atcccgtctc 
cagctgctca 
gagtcgagat 
ataaataaat 
gaccccagtg 



taacaactag 
agataggacc 
tatagctggc 
aatctctgag 
attgtctatt 
ccagtgggta 
atcacctgcc 
ttgcattata 
gtaagtcata 
ctctgaggca 
taggtctaca 
gtttgtgtgt 
atcatccatc 
atctcaaact 
tgaagagctg 
aggtatgggt 
gtaacctgat 
aaactgaaag 
tataacttcc 
gtccatgtaa 
caacaaggca 
atgaggaatt 
aactttcaaa 
ctatctgact 
atatgaaact 
tttctcatat 
gagaatgttg 
gtaggaggtg 
gccatcagag 
ggctttaaaa 
aatgaggatt 
tagccatata 
gaaacctgaa 
gaatgacaga 
gatgggcggc 
caatggttaa 
aaacttagat 
aaatggaaag 
ggtcccataa 
cgcactgtga 
ggcccttcct 
cagtcttctt 
accagtttcc 
accacagctc 
cccagggaag 
tttttttttt 
aaatactccg 
ggaagttact 
gatgggctgg 
cccttcttcc 
tttgactttc 
tattactgtc 
attacaaata 
tcatatatta 
gtattagtaa 
ctggttctgt 
atggtggctc 
gttcgagacc 
tacaaaaaac 
agagactgag 
catgccactg 
aaatgctgat 
cggcaccccg 
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50940 
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51180 
51240 
51300 
51360 
51420 
51480 
51540 
51600 
51660 
51720 
51780 
51840 
51900 
51960 
52020 
52080 
52140 
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52500 
52560 
52620 
52680 
52740 
52800 
52860 
52920 
52980 
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tccttaacgt 
tctcgccaat 
taatccagtc 
aatgccctag 
aggtttgtgg 
ctggccacta 
actactgcct 
tctttctaaa 
aggcttaaag 
ttgggaggct 
ggtgaaaccc 
gtagtcccag 
ttgcagtgag 
ctcaaaaaaa 
gtgttttagg 
aggttagaag 
attttatttt 
tctcggctca 
tagtagctgg 
gagatgggtt 
cttccttggc 
acattaaatt 
gagagaagtt 
tggttttaaa 
gcttctgagt 
taaataggtt 
ctgaaatttg 
tgtcacccag 
gttcaagcga 
tatttttatg 
atttcctaaa 
caggaaatat 
taaggtctgg 
gtagtgagga 
tttttagttt 
gacaggactg 
aggtagaagg 
cggccttggt 
ccacggccac 
tgattgttgt 
tgtctttgat 
ctccattgct 
gattttcctc 
gtttcttttc 
ctgagtttta 
tttttgtaga 
aacaatcctc 
ctctatttga 
tatttatttg 
caatttctgt 
ttagtttgtc 
tgtgaggcta 
tgggtttctt 
ggcttgtttg 
tttcaaacta 
cttttctaaa 
gtcaagagaa 
acctgaattg 
gaattataga 
tttaatggca 
tattttaaat 
gagtctcgct 
ccaccccacc 



ggaggggacg 
agcagaagga 
cttattagtg 
ggtagtgctt 
ctgcctggcc 
ggattttttt 
tgggactttt 
catttctcat 
tcctgatttg 

gaggtgggca 

cgtctctact 
ctcctcggaa 
ctgagatcgt 
aaaaaaagaa 
agcgcaaatt 
gaagaatgat 
tctgagatgg 
ctgcaacctc 
aaccacaggc 
tcaccatgtt 
cctcccaaag 
ttaagtatat 
tatgttgata 
agtatacagt 
agcagcctac 
ttacctgctt 
gaacttaaca 
gctggagtgc 
ttctgctgcc 
tataaggaca 
gaaaacaaaa 
ctttcaccag 
gcctgacatt 
aaatgatttg 
taaatctgaa 
tcagtcgatg 
tacaaaggca 
gtggcccggg 
gggctctggg 
accagagtat 
ctgttctgga 
gggaacgcaa 
tatttctttc 
catttcacct 
acatgtatta 
gatgggggtc 
ctacctcgtc 
tttataaaaa 
aacatattat 
ttttcttatc 
tttatcttcc 
gctgtcccct 
cttccatgac 
tttgtttgca 
cttagtccac 
tagagttttg 
atgatttttg 
acattaattc 
atctcagagc 
aaaatcactc 
atacttgttt 
ctgttgtcca 
aggttcaagt 



aacacctagt 
gcaagaccta 
ttcaccgcac 
cctgactggg 
ccaagccaga 
aaactcctga 
agaatttaaa 
gtaaagttgt 
cgggctgggt 
gatcatgagg 
agaaatacaa 
ggctgaggca 
gccactgcat 
aaaaaaaaat 
tgatagcaat 
ataaatttct 
agtatcactc 
cgcctcctga 
acccgccagc 
gaccaggctg 
tgctggaatt 
aacttcccag 
tgtggtaatg 
ggaattgtgg 
aaatataatg 
attttgttcc 
cggccttttt 
aatggcacag 
tcagcctccc 
tattaaggta 
tcttgtaatt 
ctaatgactg 
tttatgtttt 
aaactcagat 
gtagcacttg 
agagctgtca 
agaaaggtgg 
gatggatgca 
gttagactgc 
tgtttttgtt 
aacttctcca 
tctcgtgtgc 
ctcttgtatc 
ttgactcttt 
tttttcagtc 
tctccacgtt 
ctcccagggt 
ccatttccag 
gcatatttct 
tgttgttttc 
tagcagggac 
gggagtgtgg 
tcactgattt 
tcttgtccag 
cattcctgga 
gtgtgaatgc 
agatgaacac 
agtttctctt 
tataacttcc 
tataaatcag 
caaatttgtt 
ggccgaagtg 
gattcttgtg 
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gagggcgaag 
ggtttcccct 
agcctttgct 
ctgcgcattg 
catgctggtg 
ggtgattcta 
caagtaattt 
ttcattttta 
gcggtggctc 
tcaggagatc 
aaaattagct 
agagaatggc 
tccagcctgg 
tcctgatttg 
atagatgaag 
taaaaggtaa 
tgatgcccag 
atttaagcga 
acgcctggct 
gtctcgaact 
acaggcgtga 
tagtttgaga 
agtccacaga 
aaggattgaa 
ttagtatctc 
ttgttagatt 
tgtttgtttg 
tcttggctca 

aggtagttgg 

ttagattcta 
gaatattaat 
aagcaatgcc 
cacttgagag 
gtgtcccgtg 
caggtaatgt 
gtcgtgagtt 
gaaggcctgg 
gaaccgcgag 
tctcatcttt 
gtttatttac 
ctgtgatttc 
tatcctttgc 
tccttatttc 
ttaagtctat 
cctgctattc 
ggccaggctg 
actgggatta 
ttctctgtca 
tttgaaataa 
aatcatacgt 
tgagatgatc 
gcttctgact 
agtagctggg 
cttttctgag 
gacgatgggt 
ttctctgaga 
gtaggtcagt 
aaagagtgaa 
attagctttt 
ccagaaaaag 
gagacttttt 
cagtggccca 
tctcaacctc 



aatccacctt 
ctttcacagg 
tgaatgaatc 
gactcacctg 
tcattaatat 
atgcaaagca 
atcctagaag 
gactctaaaa 
acacctgtaa 
aagaccatcc 
gggcgtagtg 
atgaacccgg 
gcaacagagt 
tttgcttaaa 
gacgtgtttt 
cattaaattt 
gctagagtgt 
ttctcctgcc 
aattttttaa 
cctgacctca 
gccacagcac 
tcttttgata 
aacactaaaa 
ttggtgaatt 
aaccattctt 
tcaagataaa 
tttgagatgg 
ctgcaacctc 
gactacaggt 
ttaagcacaa 
gttgaaaaag 
tctactagaa 
ccagcctaca 
gccctaatga 
cctatctggg 
ctgagtaatg 
agcctgtgcg 
aagagagagg 
ggttttctgt 
ttgagagtca 
ttttgcctgt 
ttctatagcc 
attctggatg 
tctgatgcta 
catttatttt 
gtctcgaacg 
caggcgggag 
aaattattaa 
ctcccttttc 
tccatatcta 
tggagctggg 
ctggttcacc 
caacgtctgc 
ggctcacagt 
gaattttaac 
agacagcagt 
ttgcaaaaga 
aaaaaccatg 
tttttggtgt 
agtctgtttt 
tttttttttt 
gtcttggctc 
ctgagtagct 



ctgtattgcg 
attttcttcc 
aaaaactcct 
gggatctgta 
ggggtgcacc 
gagtttggaa 
aagtttcatt 
ttaaagacca 
tcccagcgct 
tggctaagac 
gcgggcgcct 
gaggcggaga 
gagactcctt 
ggttgagtga 
attattttac 
attttatttt 
actggtgtta 
ccagcctcct 
gttttttgta 
agtgatctgc 
ctagccagca 
tgagcatggg 
tttagtttcc 
aaaattagaa 
tttttcccat 
ctgtgttaaa 
agtctcgctg 
tgcctcccgg 
gcacgccaca 
aattgtttct 
ggagagttta 
tggagaacag 
tgctatttct 
ctttattttc 
cagccctgca 
tgaaggtgcc 
aagagcagca 
ctgacttcag 
aggttcattg 
caggccgtcc 
tttctcacgc 
catgtctcat 
tcttctattg 
aatccatata 
ttttaattat 
cctggtctca 
ccttcatgcc 
tcctatcttt 
tggctcccct 
atatgcctgg 
gttctgtctc 
tcctcttcca 
aaatagctgg 
gaggagccta 
ggccacttaa 
aagaggccaa 
cacactaaac 
attccatgaa 
aatgcccatt 
tttttagact 
ttttgagatg 
actgcaacct 
gggattatag 
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53100 
53160 
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53400 
53460 
53520 
53580 
53640 
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54900 
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55020 
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55200 
55260 
55320 
55380 
55440 
55500 
55560 
55620 
55680 
55740 
55800 
55860 
55920 
55980 
56040 
56100 
56160 
56220 
56280 
56340 
56400 
56460 
56520 
56580 
56640 
56700 
56760 
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gtacctgcca 
gggttttgcc 
tcagcctccc 
ttatttggag 
ggtcattgac 
aattttgacc 
ctatgtatag 
tatttggttt 
ggaacatttt 
atctcatttt 
tcagagcaca 
ggaacaaata 
acatagaatg 
gcatatgctg 
agtctatatc 
tgctcagtaa 
gtgaacatgt 
catcatgtac 
agtaaaagat 
tgcccccaga 
ttccatggtg 
ggaataataa 
catgcaaaga 
ttcttcagca 
gctataactt 
catgtgtctt 
ttggacttac 
ataggagtat 
cattgagaac 
catctccccc 
tgcttcctca 
aagaactgag 
gtgacctctt 
tatatttcag 
ctcatttggg 
cccagtggcg 
atgcagtgtg 
cttcctgagc 
ctggcgtttc 
agacactaat 
ttttccatta 
ttaactttca 
tttcttgaaa 
cctcgaggat 
tgtacgcaga 
gaaggtgccc 
acttttggca 
tgcggtccat 
gcctaacagt 
aaaatatttc 
gtgttttatt 
tcaaatgtgt 
gcctcagtgg 
ctgggagacc 
gctttggcca 
gtggatcacc 
ctactaaaaa 
cgggagacta 
atcatgccac 
aagaaaaatt 
atttaaatag 
ggagtcagct 
tttgagatat 



ccatgcccag 
atgttggcca 
aaagtgctgg 
gatccagtta 
attgatagtt 
tgttttcttt 
ataccagcac 
attctactac 
taatgtgagt 
ggagccatct 
gaaatgttta 
gagcatactt 
cttaaattcc 
atagttgctt 
ctgttagtct 
atattttctg 
ttaaaacatt 
ttcacttatt 
ggattctctt 
gccagatgcc 
ttgaacaaat 
taatagtacc 
gtttaagata 
ccagcttggg 
ggagtaggtc 
taatgaaaat 
tgaccacagg 
agccttttct 
ctttgttttg 
aggaggcaga 
gttttcgaga 
atctttgtga 
tttagattga 
agagctctaa 
tgtagcatcc 
ggaataaaaa 
gccaggccag 
agagaaaata 
ctgtccttca 
attttatgaa 
gatggtgcac 
agttgaaaga 
atgcagcacg 
gacacccttt 
aagataggcc 
ttggtttcca 
gagtgtaccc 
cagcagatgt 
tgctttcagc 
tacagaaaac 
tgttcactaa 
tttggactgg 
tcgtctttgg 
cacaggcgct 
ggtgtggtgg 
tgaggtcagg 
tacaaaaaat 
aggcaggaca 
ggcattctgg 
ctgctggtag 
ttacagctgc 
taatgatagt 
tcggaaactc 



ctaatttttc 
ggctggtttt 
gattacaggc 
agcagtttta 
atacatcttt 
gttacttgtt 
tctggtagaa 
ttctggattt 
tatcaacagg 
ctgccaatcc 
cagaggcttt 
attttgatag 
tgaaagtttc 
gttcttacat 
cttaatactc 
agtaaataag 
tttggtgctc 
tttgaatatt 
ctccaaggct 
tgggttcagt 
tacttaatat 
agtctcctca 
gtacctcata 
tgagggtcat 
tctcttacct 
gaccctcaaa 
catgccagag 
aaaagctcct 
ggggtgagat 
agactgagtt 
gatgctttcc 
gctgcgatag 
gttttctatt 
tcatgtcttt 
ccggagaagg 
gagtactaga 
ctaggggcag 
gaatacttga 
agtaaaaggt 
tgcagtttta 
ttggctggaa 
gcagtgactc 
ggtatgttgt 
gtaaatatcc 
ctttagtgcc 
acacagctgt 
agagctggca 
tgcttgatga 
ccccattagc 
.attaaatagg 
ctgattttgc 
acgtggtaga 
ccacgtaaag 
aagagaggag 
ctcacgccta 
agtttgagac 
tacctgggtg 
atcacttgaa 
cctgggcaac 
gcattctatg 
tagctcctaa 
aaactgtgct 
aatagcttgc 
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tatttttttt 
gaactcctga 
gtgagccacc 
ttacctctgt 
tcagagggag 
gaatattgtc 
atacacgaat 
ggtttttcaa 
atagcttttt 
agcttgttgc 
cctaagcctg 
tggtttaaaa 
tcaaataggg 
ctttgctaga 
agcaggatat 
agatgcatta 
ttaaccatca 
cttccaaaac 
gtgcatggca 
cccatctctg 
ctgtgcctat 
aagggtttgt 
tatagaagtg 
gtctgcatat 
gcctcctctt 
actctgggac 
ccaaaataga 
tcagtgattc 
gtaggccatt 
ctgcggtcag 
tcatctccag 
ggtactcaca 
tctcagtcat 
attgcggagg 
agccttgcag 
tgcccagagg 
gaggaaagag 
gccaattttc 
tctggaatga 
cagtttgcag 
gcatatactc 
atccaaagga 
tatcacacgt 
atgtaaatca 
gaccagccgg 
ttcagtgatc 
gtggcgggga 
agccatttaa 
acgttgtttt 
atcttcaaag 
atgcattgta 
aatgaggacc 
gtagaggcca 
ctagccgaag 
taatcccagc 
cagcctggcc 
tggtggtaca 
cccaggaggt 
agagaagatt 
cactgagcaa 
ggtctatctt 
aaatgggtct 
tgaagtagca 



ttttttaatt 
cctcaagtga 
acacccggcc 
aatcttagtt 
aaatagaaaa 
agacacagaa 
gtaatttttt 
aatattgatt 
gtaagtggct 
atgtgaaggc 
gaggcctgga 
aaattaaaga 
tgcaaaacaa 
atatgagccc 
agcatcacaa 
atttcccttt 
ctcagtaatg 
ttgagagact 
gcgcagtgtt 
ttactcacct 
acttctttgt 
gctaattaat 
ctcaaaaaat 
tgactgtgct 
tgcccactcc 
agtccacact 
gtcttgggca 
tgagctgatg 
agcatgaaat 
aaatgcccgc 
tatcattaga 
gctgtcattt 
atggaaagct 
cagtagattg 
tggaaagaag 
gtgggaaagg 
agctgcaggg 
atgtaaaatg 
gtacttcact 
taatgccagg 
ttgtagcttt 
caggtgatat 
ttaggggaat 
tttccattgt 
ccagtgagct 
tgtaattgct 
tgtgctcgtt 
aaaacagctg 
tttcttgtta 
aactccatct 
aatgtgtggt 
agccagggtg 
ccgacggagg 
aagtctattt 
actttgggag 
aacatgggaa 
cacctgtagt 
agaggttgca 
ccatctcaga 
aggagagatg 
actatctgca 
agaaatatcc 
aacttgaatc 



agtagagaca 
tgtgcccgcc 
tggtgagact 
gcagcatgta 
tattatgacg 
cccaaagaag 
tttctccaag 
attatcctca 
cagttgtaga 
aagctgtggg 
gagatgtgaa 
attacacacc 
ataatagctt 
ataaggacat 
acaaaataag 
tactttttca 
atggaatcat 
gtcttctttc 
gctaaagcat 
gctctgtggg 
gtataaaaca 
tgagttgaaa 
gttagctatt 
ttgttctgca 
cagagaccac 
gtgtttcttg 
gggggtgagt 
gtcatcctcc 
tgtgctctgt 
ttgggggatc 
accttcctga 
attgagcatt 
gaaaagaaag 
ggaattacag 
ataaaagggt 
cctagcccag 
atacagatgc 
gattattttc 
gctgtaatgg 
cctttggctg 
gattttaaat 
ttatttattt 
tgccacactt 
tcagacccgc 
ctgtaagatc 
ttgataaatc 
gtaacaggtg 
cctgttgata 
tgtatgagag 
ttttaaaaat 
tcagaaattg 
gatctcctgt 
acatttccca 
aagatctgct 
gccaaggcag 
aaccctgtgt 
cccagctact 
gtgagccaag 
aaaaaaaaaa 
tggaggccca 
ccgtttgcgg 
aattaatctg 
cttattttta 



56820 
56880 
56940 
57000 
57060 
57120 
57180 
57240 
57300 
57360 
57420 
57480 
57540 
57600 
57660 
57720 
57780 
57640 
57900 
57960 
58020 
58080 
58140 
58200 
58260 
58320 
58380 
58440 
58500 
58560 
58620 
58680 
58740 
58800 
58860 
58920 
58980 
59040 
59100 
59160 
59220 
59280 
59340 
59400 
59460 
59520 
59580 
59640 
59700 
59760 
59820 
59880 
59940 
60000 
60060 
60120 
60180 
60240 
60300 
60360 
60420 
60480 
60540 
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ttttaaaagg gagtaaaggg actgtagata agtaaaagat gctctgcact gcgcctctct 60600 

ggtaccagtc cctctcgttt aggcagcggc cacttcccgc ggagctgttc acgccaagtg 60660 

accctgccac tgcgctgctc ccaccacccc atgtccaccc cgtcctcgga cgcctggtct 60720 

cagcacatca ccggtattct cttcctctta ccagtaatta gtttgagact gtgactcact 60780 

tctgtccaac aagatgtgaa gggaagtctt cctgggaggt ttctggaaag cgttctctca 60840 

cttgtgatag ccctgggaag aaatgctccc cgggtcctca gagctttgtt gtggctggac 60900 

gcatcttctg gaactgcgac agcggaggag gaagccaaga gagtgaacca aaacaaggaa 60960 

999cggaggg cgggggaggc ctgcaaacct tacggcttat ttccactgac atcagagact 61020 

catgttaata agtaacaagc ggctttgttt gttatgctcc tcagacacgc ggtaagggag 61080 

acacacagaa atgcacagct gtacgtattt gtcttgaagg ctagaattta ctttaaatgt 61140 

gagtggtttt cccaggaaaa atttatgtct gttctcttga ggaataatta tttcctactc 61200 

aattttatct atcgatccat ccatccatcc atccatccat ccatccatcc atccatccat 61260 

ccatccgata cagagcctcg ctctgtcgcc caggctggag tgcagtggcg ctatcttggc 61320 

tcactgcaac ctctgcctcc ccagttcaag tgattcttgt gcctcagcct cccgagtagc 61380 

tgggactaca ggcccgtgcc actacacctg gctaattttt gtattttttt tttttttttt 61440 

ttttcctgag acagatcttg ctctatcgcc aggctggagt gcagttgcgc aatctttgct 61500 

cattgcaacc tccgcttccc aggttcaagt gattctcctg cctcagcctc ctgagtagct 61560 

ggtactagag gcacgttcca tcacgcctgg ctaatttttt ttttttttga gatggagtct 61620 

tggagtctcg ctctgttgct gaggctggag tgcagtggtg ccatctcggc tcactgcaac 61680 

ctccacctcc tgggttcaag tgattctcct gcctcaacct cctgggtagc tgggagtaca 61740 

ggcgcgtgcc accacacctg gctaagtttt tgtattttcg gtagcaacga ggtttcgccg 61800 

tattagccag gatggtctca ctctcctgac ctcgtgatcc gcccgccttg gtctcccaaa 61860 

gtgctgggat tacaggcatg agccaccacg cgcagccttt ttttgtgttt tagtagagac 61920 

agggtttcac cgtgttggcc aggatggtcc gatctcctga cctcgtgatt ctctcacctc 61980 

ggcctgtcaa agtgctggga ttacaggcgg cagccaccgc gcctggccta atttttgtac 62040 

ttttaagtac agacggggtt tcaccatgtt gtccaggttg gtctcaaact cctgacctca 62100 

agtgttccgc ccaccttggc cttccaaagt gctgggatta cagggttgag ccaacgcgcc 62160 

ctgccctcaa ttatatttat ttctttgcct ttccttacgt ctttaactct tcacactttt 62220 

aaaaaagtta ttgccttcca aataatattt aggaatataa attatttgat attaatccag 622 80 

ggtaatttcg atttgttttt aaaaaagggg aataaaaaca ttattattca gaaggggtta 6234 0 

aatacaatga caaaaactgc aattcagaat taatgaggcg ttataatagg gtttgttaaa 62400 

aaaattatga ggtatttaaa atagattttt ggcatatcct tttgtgactt ttggatagac 62460 

ttaagactta gtttatatat caatagtgag tctgtatagg aaaagaatat aatattcagt 62520 

gactgtcaaa ccagtgactg gagcagcttg gtatgaagcg cttcttattc tggtctccct 62580 

aatcagtgat tttcaatttt gaaaactttt ttttgaagtt gtgttgtttt atttttctgc 62640 

agaaatatct tctgcttttc attttaaagt atatttgcta tttatttgca atctagttct 62700 

catcattaaa agcagtacta aaatcttatc ccagaattta taggttgtgt cttttgtcct 62760 

ttttttgttt ttagtatttt tctgtcactt tacttcctca ggtgaagttt taacaaaaac 62820 

gagggaccat ggataggaaa gtaggaatga aacagtttac agggttgaag ttgtggtata 62880 

attctttttt tttgttttgt ttaaagacag ggtcttgctc tgttgcccag gctggagtgc 62 940 

cgtggcgaga tcatagctca ctgcagcctt gattgcctgg gctcaagtga tccctccagc 63000 

cttggcctca tgagtagctg agactccagg caggtgccac catgctcagc taattttttt 63060 

tgtttgtttt agagatggga tttggctgtg ttgaccaggc tggtcttgaa ctcttggcct 63120 

caaaccatcc actcgcctgg gtctcccaaa gtgctgggat tataggcatg aaccaccatg 63180 

cctggcccat ggagtaattc ttgtggagtt ggaaggtaga ggtgtgtacg tgtctgtttc 63240 

tcaaaatagt agcactagcc aggaaatcca tgaatttgca tatttttccc caagttcagc 63300 

ccatttgctt tggtgagttt ggggttatac ttagagtggg tagtataagg agtttctgcc 63360 

ctacacctta gcttaagcaa tttgagcaca ttgctttttg agttcaccac caaggatcca 63420 

gagctcagag gcagtctttc ctgtgcagat aagagtgcac cctgcctgca cctcacggtc 634 80 

ttgggctctg tggcttctct cctcctgcca ctgcccctta ttgtgggtag gctggaattc 63540 

cctatggtcc tttgtttggg gaagggggat gcttggatgt tcccgggtgt cacctgtgca 63600 

tgccccctat gctgtcctcc cacctgccct gtcctacaag catgacctgc acccttctcc 63660 

cacacaccca gaccgcagct tattcttact ctccctggcc agcccctctt cttggagagg 63720 

agaaaggatg atgtgaaaat aatatctaac attggggctc cccagcgact tccacaagga 63780 

gcaaggagct aggtgcatgt gtagacccca tgggagcttt agtgttagat accgagtttg 63840 

ctagatgaaa catcttttta attgaggtgg tgcagatgta ttgtttgaac actttagaca 63900 

ctaatgatga actacttgga tgtacatttt tttggttttt tttttttttt gctatgaaaa 63960 

ttagaaaaaa tatttatcca agacagtaag tattgaaaac tgatactggt gctgtatgga 64020 

tcactattat tgtattattt gaaactgttt ggaaaaggta ttgtagtttt tagaaaaaca 64080 

aagcaacctg aatattaaaa gtctgtgaat ttgagtaaaa aacagtccac ataagggaaa 64140 

aaatatataa ggaaggacaa tgaagttttg aaactgttac tataagaaag ctaaaggctg 64200 

agcacagtgg ctcatgcttg taatcccagc aatttgggag gctgaggcag gaggatcgct 64260 

tgaggccagg agttcaagac cagcctgggc aaaggagtga gacctcatct ctactaaaaa 64320 
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taatttttta aaaatattag ttggacatga tggtggccac ctgtggtcca agctactagg 64380 

gaggcttgag accaggaatt cgaggctgct ctgagccgtg attgtaccac tgcactccag 6444 0 

cctgggcaag agtgagaccc tgtctcaaaa ataaacaaaa aagaaactta aagattttag 64500 

tctcaatttt ctacattgaa cccatcttta gatcatagca tgtataaaat taaaaatggg 64560 

ggaatatcaa cattattata tttaatgcta tagcttatta ttgtatttaa taagctactt 64620 

gtttaaagat ctggggtctc ttgggtccac agactgagtc tttctgaagg tgctttacac 64680 

gatgtagctg ccagggatct aggtcatata atatcctcag gatgggattt gaagacattt 6474 0 

ttccagaatt tatcttttgt catattggat tttattttta aaaatttcct ctatagtcaa 64800 

aatttatata aatatatgat tctgatagta ccatatatat ttagatgggc ttatactggg 64860 

cgtgaacaag gttaataatc tttgtgaata tgtgggttat ctccttattt tacttattct 64920 

taaggaaaat taatttcact gtttaccaaa gaactgatag ctaaacccaa aagatttcaa 64980 

agaatgtttt gtttttgaaa tgtttctatt tatcactaat aaaacgggta tatctgttta 65040 

agttgaccta tctttggtct tactaaaaca aaatcagcta gaccatttcc caaataatca 65100 

tgcattcaat actctttttc tctctctctc cctgctccct catctctact cctttagaac 65160 

tttcagaaca ttcttttgtg tagatacagt gtttcatgtc tgttattgtt tctcactggt 65220 

cgttggattc tttcatgtga ccaccttttt cacgtttgct ctgattgcct ttggatgcgc 65280 

ctaactgtgt gcttttcctg ttaaggaaaa gaatcctgca tgtttttttc tcatcgaata 65340 

acaatgttaa aaacagaaaa gggttgtttt tcttctttgc agtaggcatt ctgtagtaga 65400 

taccttgaca tacttaaatt tgtgagatgt gtctagacga atggaagagt aatatctcat 65460 

attaatatat tgctaataat aagataaagg tttcagcttc ctggagctgt ccatataata 65520 

gaatttgtac ttgttttttc atttctgaga tcctcatact ttggggtttt ttttattttt 65580 

ttattttttc gagacaaagt ctcgctctgt cacccaggct ggagtgcagt ggcgcgatct 65640 

ccgctcactg caacttccgt ctcccgggtt caagcgattc tcctacctca gcctcctgag 65700 

tagctgggat tacaggtttc ctgccaccac acccagctaa tttttgtatt tttaggagag 65760 

ataggtttca ccatgttggc caggctagtc tcgaactcct gacctcaagt gattcgccca 65820 

ccttggtctc ccaaagtgct gggattacag atgtgagcca ccatgccagg ctctgagatc 65880 

ctcgtacttt taaataaaat gttaagatac atgctttatg cttttgctgc ctctcatgtt 65940 

tcatgaatac aagtaaaccc atgagtaact catgaataca cataaacttc tgggcctcca 66000 

aacgatgccc tgccagtggc catgccacag gaatcagagg ctgtacttca ctttgtggtt 66060 

gctttattat tccaccatta taagctttag tagaaaatgt aaagagggtt gttaaactga 66120 

aggagtgttg tctcaaactg aaggagaaaa gtagtgttgg tgctgtaaga tgtacataaa 66180 

ctaaggggtg tcttttctac catccagtta gcaattagga aagtccttct ttgctcatac 6624 0 

cattccaaag ggagtcatct tattctttct ctaaatttcc ttacaatgga ggctgctaca 66300 

gtttaagtat cgaaggtcct tttttttcag atttcacctg cagtgcctat aaatttgggg 66360 

gaatgccttt ttttgggggt gaccaacata ctcagtggat cttggaccta ccaccaagtg 66420 

accttccttg ctcacctgta aggctgagaa caccgtaagc aaagtaccag gcttctttcc 66480 

ccaagagggc tttgtaagcg ttggcgccat aaaatcaacc tgaggactta ggtggctggt 6654 0 

tatttctgag taagtgaata tcactctcaa atacgacatt ccagcaaagg ccatggttgc 66600 

atagccactg tttttagtta tgtcctggta actaggaaga tggattgttt tttaatctat 66660 

gcaaataatt atattgcgct gaaaaaaatg atactcaatt acagtttcac aattctggag 66720 

ggatcaggca gggataataa gataccattt ccagatgttt cctttctgtt tataaaagca 66780 

tagtcgactg aattgttagg agatacaggc agagggagaa gagaaagggt tccttatgta 66840 

tccagaatat agagtgttaa aatagcaaca atactgtaaa caaaagccgc agtcctcctt 66900 

cagtagttca tctgggccta gtcattaatt tttgttccac ttgatcttgg gttagcagtc 66960 

tcatgaatcc gtctgcttct caatgagggt tatagaaatc ctcttcccct ggtggggtct 67020 

cagcattatt tagacaatgc cataagaagc ctgtacccaa aagtacccag tatagttctt 67080 

ctccacgggg ctctaacaca gccccctctt ggtcgaaggt aagtcactct ggcctatagc 6714 0 

taattgcaga tgctgatcag ggaagtgtca gagaaacaca gaaatctgta ggtgacaaaa 67200 

gattttaaat ggctatggtt ctcgtattac tgataatttt caaaactaaa tttattgaga 67260 

gttcattaca acagtattgg caactgataa gtaaagttag ttatggtgtg caaaacagag 67320 

tcaacccgaa aaagttctag atacaacatc tagaaacacc ataattaacc ttattttaaa 67380 

agaacagtgg atgttacatc taatttataa aaatggaaga acataatctt tacagaaaaa 67440 

atcttcagat ataacaaaat agtcccaaga catartatac aatgaatatg ccaagcatat 67500 

aattagaata gaccaagaat atcacatcaa gagggttatt ttagagggga cataaacacc 67560 

tatgtattaa taacatatat ttaacctagg gctggctatc ttttttgatg tgacaatttg 67620 

tcccatataa cttatcaata gtaacacatc aaatggatct cctaattatt tcaagcatct 67680 

gttttttatt aaagtaaaag cacaaatact ttttattttc caggtatgtc tggggaatct 67740 

tagacagttt tttgttttgt tttgtttttt tgagatggag actcactctg tcacccaggc 67800 

tggagtgcat tggcccgatc ttagctcact gcaacctccg cctcctgggt ttcaagccat 67860 

tctcctgccc cagcctccca agtagctggg attacaggtg cctgccacca tgcctggcta 67920 

atttttgtat tttttagtag agatggggtt tcgccatggt gtccaggctg gtctcgaact 67980 

cctgacctca ggtaatccac ccgcctcggc ttcccaaagt gctggaatta cagggataag 6804 0 

ccaccatgtc cagcctcaga cagttttaag tacaaaatat atcatttagg atttgatttg 68100 
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cggaaggcaa aatatcaaaa attatcaaga aattttgaat acctgattcc aataggatca 68160 

tgtaacttag aaacaatttt tgactaccta tttaatcaaa gtgactgtaa aaggttttaa 68220 

aagtaaacag agaggtaaca tgattgtaaa gaaccttagc tctttcctaa gagacacgaa 68280 

ttcttgaata ctcaagggta aaataaagtc aatataaacc atagaaggtt attctcataa 68340 

aacacagaat ctttggaatc taagccaatt atacagaaaa aagaataagc ctttattttt 68400 

taggtgaatg tggtaaacag taaaccaaag aaacaggctc atcaatattg ggtaaacttt 68460 

tctttgtttt taaatgttta gtctttagtt ttaagagatc atctgcattt tttctgtaat 68520 

aaacttaaaa gatatccact tatatttctt cagatttatt aattctgtag cattttaagc 68580 

attgaaatga cagtttttct ctcaatcctt tttttttttt tttttttttt tgagacggag 68640 

tcaggctctg ttgcccaggc tggagtgcag tggcacgatc ttggctcact gcaagctccg 68700 

ttctccccag gttcacgcca ttctcctgcc tcggcctccc aagtagctgg gactataggt 68760 

gcccaccacc atgcccggct aattttttgt atttttagta gagatgaggt ttcacagtgt 68820 

tagccaggat ggtctcgatc tgctgaactc gtgatctgcc cacctcagcc tcccaaagtg 68880 

ctgggattac aggcgtgagc caccgcgccc agcctgtctc aatccttaac aatgctatat 68940 

ttgttgtatt tcatatgttt agctttctca tggagaaaaa gaaacatagg cataaacctt 69000 

tatactatcc gcctgctggt cctgcaacat gagtttaata aagcgttcct gatacttaaa 69060 

caatttctat gatgtcagca gagagatatc agcaagagtg attgtaaagt agctagcctt 69120 

ataagtcaag agttataatc tttgatccac tgctcaatcc atttcaagat ctgatctaca 69180 

ttattttcta gctcttctgg tttattgctg ggcagccgat gcacaacttc ttccttgtag 69240 

gatgccgtgg cttcttcata aagaacttgg aaaatctcac actgaatatt gtcttttagt 69300 

ttcttctcat tataacccct catttgaagt atttcgtaca atatgttggc atctattctc 69360 

aacacaaaaa ctatgtgaaa ccagcgttca gggaagaaat cacaaccgtg gtaatcaaca 69420 

ataactccac attctctcat ttggttatct aactcatcag ctactctgtc ttcatctaaa 69480 

atgggacaat tatactcttc atcatagtcg tcatacaatt rcttttctca agctaaatca 69540 

cccacattaa tgtatttcaa tcctgatttt gattctgata tgtggttttt ccaacccctg 69600 

gtgtacctgg ctttctatga cacgtttcta tcaccaagtc agaacaaagt gacactttag 69660 

gactgaactc agggagtctg tggggtcaaa actaatttca taatactact aagactttaa 69720 

catgcaatgg gttcaccttg ctgtctccaa aaaaaaaatt gcaccactgc actccagcta 69780 

gggcaacaga gcaagaccct gtctctcaaa agtaaataaa taaataattt aaaaaattat 69840 

tgttaaaaaa agtttgtcag gttaatgatt caatttgatt aagcacaaat ttacattttt 69900 

tcatagtctt aaactttagg agtaacgttc acttatttga tcagtaaatc tgtatagctt 69960 

ttgtaagaac atgtaaaagc agaatagcaa tgtatagtgt ggctgggcac agtggctcat 70020 

gcctataatc ctagaaattt ttggagtcca agatgggagg attgccgagg gcaggtattt 70080 

gagaccagcc ttggtgacat agcgagagac cccatcttaa aaaataagaa taataatact 70140 

taatgctgac aactcataga agacatgact atttttatta aaccccaaat attcaactag 70200 

tctcatttgc caaatattta cctaaatgtg tgaacttgaa ttcttaaaac atttacgttt 70260 

ctataggaat acttttttta gtgctgttga aagtattatt ggaagttcaa tttccttaat 70320 

ttctgggaat tttaggaaga ttcaatttat aggtgtctct ttatttctaa gccagtcaga 70380 

acagaacatc cttaagagct atcacattct cacttggtaa gaccatctca tgatggttat 70440 

cccaggatga gagacaatag ctgctttgaa agttcccctg ccacactggg cttccagtac 70500 

cagtgcagct aatgaccctg ccctaacagc aaatgctggg gagcagggtg caagtgttta 70560 

cttgggtgcc cttcacgggc actcctttta cgtggtggac agcctgatgc tttgttctct 70620 

aaaccagtat caggcattcc tctcatggga gatgtgctta tcctggcaga cgcccttgtg 70680 

gctcttttct gacccctctc cagtttatga ctgcctgacc atcgctctgg tgctcagagc 70740 

ctgcccttgt gttcctcccc agcatcccgg ggaaaaccca ggtagcctgg gagagcccct 70800 

ggttcttcag atggaatgtg caaattcagc acaccaacac gataggaaat aagttccaag 70860 

atttattact tccagatcct agagagggag ggcgccatga gtcgggaggg caatgctcta 70920 

tccccaggtc accagaagaa tgaatgaagt gtcaggcata gagcaagaga gagtgggacc 70980 

catgggccac cacctttact gggggccagg gcattgtcca agcaggtttc ctgcagggag 71040 

ttttagttgg tgagtttaaa acaggcagcc atgagtttca ggatcacaca gcaactgaga 71100 

ggtggtccct gtggcatact ccacagtcca tgtggggtgt ggggttggca gggcagccag 71160 

gtagactgtc tcttagagag gccgtcacca gaaagaggag gtgtataagg cagatccctg 71220 

gatcaacccc attgaggact gggggtggca ggtggaagct gtcgagggaa actaagccct 71280 

gtttctggta tgagaaggtt aaacttatca tcaaaataga tgccaaggct atatgaaact 71340 

gtcagtattc actacagtgg catttccaca gtacaataca gacatacaaa cagacataga 71400 

taatttgtaa gctgtaattc taaaatttca ggccaggcgc ggtggctcac ctctgtaatc 71460 

ccagcacttt gggaggccga ggtgggtgga tcacctgagg tcaggagttg gagaccagcc 71520 

tggccaacat ggtggaaccc tgtctctact agaaatacaa aaattagctc ggtatggtag 71580 

tgggcgcctg tgatcccagc tagttgggag gctgaggcat gagaattgct tgaacccggg 71640 

agatggaggt tgcagtgagc cgagattgca ccattgcact ccagtctggg caacaagagc 71700 

aaaactccat ctcaaaaaaa aaaaaaaaaa agaagaagaa aaaaattcag tcatagacca 71760 

aacttaaaag cagaaatata aaattttact cagatgtcta cttcctgatg gcatgaaatt 71820 

cttaattgtt ttgaaaccaa agtagaaaag cagacaaacg aaaaatacta gcaaatcaga 71880 
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ttctgttatc 
tacgtagtat 
gccttttgtt 
acagttggtt 
tcatttttgt 
aacatttggc 
tacaggaaaa 
agcgtgcatt 
attccagtta 
gtgggggatt 
aaatgagaat 
cccttcttac 
tggctaactg 
gtctggatta 
ttttccagga 
gatgtaactt 
ttttgactgt 
gacaagcatg 
cttttctctc 
tcacttttac 
ccaaatgtga 
tgtgctcgtt 
aagtggaaca 
aaagctttcc 
ttgttggcat 
aaatacagtg 
tgtttttgat 
tttcagtgaa 
ttttcctgcc 
tgaaaaaggg 
aatagtctaa 
tttttccaaa 
aaataaaact 
tcttgtgata 
tgcagagaca 
gttctaggga 
atcatttttg 
ctcgggttga 
atctcttaag 
caggttttga 
ttggttcacg 
taggagttca 
aagttagcca 
ggaggatcac 
tccagcctgg 
ccatcatttg 
cccctttgta 
ttatttattt 
gtgcagtggc 
tgcgtcagcc 
tttgtttttt 
agaaagccct 
aactaatggc 
ggaagttgga 
agagcaggag 
ttgatgattt 
ctcaataaat 
agataggttc 
gatgtcaggg 
cggagtctct 
ctccacctcc 
aggcgcgtgc 
atgttggcta 



tttcacccaa 
acaaaccgct 
ccttctcatt 
acatttgaag 
ttatatatgt 
tgaatatact 
tttgtgtttc 
tcttagataa 
atagcagatg 
cacttgacag 
gaggctgcct 
cctttctcac 
gtggaacaga 
ttttggattg 
atataaaggt 
ttcaaaaaac 
tttttgttcc 
ccatctgagt 
ttatactcta 
tttataaaat 
gcgacttagc 
ttggtgttct 
tcattgacca 
acagcagtgt 
gctcttatca 
ctatattctt 
atcaatcata 
aaatatgctg 
tggggcttag 
ggtcactcaa 
tacagaagcg 
ataaaagaaa 
tttcatttag 
cttaagaata 
cttgttactt 
gtgtgtagtt 
cctttatttc 
agtagtgatg 
aatttaaaag 
gatggacata 
gctataatcc 
agaccagcct 
gacttggtgg 
ttgagccctg 
gcaacagagt 
gaaggaagag 
atataatttt 
atttatttat 
gcaatctcct 
tcctgagtat 
tagtagagac 
ccattgggga 
atagaaaggt 
cggaaagaat 
ataagccata 
aataaaaagg 
attacatttt 
aaataggcct 
gcctggttcc 
ccctgtcgcc 
cggattcaag 
caccacgcct 
ggctgatctt 



cagagacaag 
tcatgtctgt 
tacttcacct 
tatttcatgt 
atatgcatgc 
gccaaattgt 
tgattatata 
gcaaaaaaaa 
tcaatagaac 
gtgcaacaga 
agaagtcttg 
tttgaagtgc 
ttcctggggg 
ctttggggac 
tttttttctt 
ttattacagt 
ttcttgtttg 
aagtacttgt 
attctgggtg 
taatatctga 
cttgaaaact 
tttgcttgct 
aaacatttcc 
taaagttgct 
tctcccttaa 
tgcaacagat 
ggttttaagg 
cagagagggc 
aaataaaagc 
aatttttgta 
aatattgaat 
aattactagt 
gccatcttct 
agacctggac 
actggcacat 
tagagctttt 
tttccaagtt 
aggcccagat 
catgatataa 
cctaagatca 
cagcactttg 
gggcaacata 
tgaattgcct 
gagatcaagg 
gagaccctgt 
tgttagaggc 
tggagaggag 
ttattttgag 
cccactgcaa 
ctgggactac 
agcgtttcac 
ttttttaaat 
tattataaaa 
atattttttt 
atggtcatga 
gtcttttttc 
caaaataaaa 
gaaaaacaaa 
tttttttctt 
caggctggag 
ctattctgtc 
ggctaatttt 
gaactcctgg 
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atctctataa 
cattttcgtc 
tgacttttca 
aaattacaaa 
atacacatac 
taaacaatag 
tttctatagc 
aattaaacat 
aaataagttc 
agcacaagca 
agaaaagtgg 
atgaagacgt 
aaattttttt 
agtatctgag 
tgacatatgc 
ttatttctgt 
aaatctctag 
tttgatttct 
cctttaggca 
gttagaagat 
ctggggttgt 
tgataccaaa 
cttaaaggtc 
atgtatgcat 
acatttaaca 
ttttgaattc 
ttttaagaca 
acctttagaa 
actgatcatc 
aatatattat 
atatgtgtaa 
taactgctta 
tgtcttactc 
attctgattt 
ccagcaagca 
tactttttgt 
taattatttt 
cttgactcac 
ttcagccctt 
ctagagataa 
agggtcccag 
gcaaaacctt 
atagtcccaa 
atgcagtgag 
ctcaaaacaa 
agtctgtata 
agatgtttat 
acagagtctc 
gctccacctc 
aggcacccgc 
catgttgttg 
tttctgggag 
gggaagaaag 
aaaggatatt 
gctttgtgac 
ccctcttagt 
taagtgaggt 
tcattgcccc 
ttttcttttt 
tgcagtgaca 
tgcctcagcc 
tgtagttttt 
tgatccaccg 



accagcagtc 
aaccctgggg 
agacatattg 
agtatatgaa 
acacacactc 
tcatttctag 
atttaaattt 
tttatttaaa 
ccttatccat 
ttattgtgca 
ctgacgagtc 
tgacacactt 
gttttgctct 
tttctatctc 
ttaaatgttt 
gggaaaaata 
ccaacaagaa 
gttcaatgta 
acttgtcaat 
cactgaaaat 
ttaggcagca 
tagcttcatg 
ttaaagcaat 
tttgtggaag 
caacaaagaa 
ctgtttaaag 
tccatcaaaa 
cattttcagt 
aaacaccata 
gaaatatatt 
tattttttaa 
ttttctcatt 
tttttttctc 
tatgtggatt 
gctgccagcc 
ttttgttttt 
tcttgactca 
acatcttttc 
tcattttaca 
aactaagaag 
gtggacatat 
gtgtctacaa 
ctacttggga 
ccatgattgt 
taaaataaaa 
agcatagaca 
ttctttttct 
cctctgtcac 
ccgggttcac 
gaccacgccc 
tatatatcac 
agagggaaaa 
aactgagggt 
ttaagtatta 
aaataggtcc 
agaaaaacta 
tcttggttct 
agtgggaaga 
ttcttttttt 
cgatcgcggc 
tcctgagtag 
agtagagacg 
gcctcggcct 



cttccccaaa 
tccttcaaat 
gttatactac 
taatgtgaat 
ctatagagtg 
ctggtggaat 
tttgcaagtc 
ttttttttca 
gcttctgtat 
cctgtgtctg 
tacaaaaaca 
ggaggtctgc 
tgtacctcat 
ttggcctgtt 
atttttaagt 
ttttttakgt 
cattagtcat 
aaatgttaac 
ctgtcctgta 
taaacatgta 
ttaagaggtg 
aatgttcaag 
actgcagcag 
ggtcaatagc 
catccaacaa 
gggaaaacca 
cattggaaca 
agtgggatcc 
cattatatag 
gaacattcta 
agtctttgta 
caagatttaa 
cacatggact 
agctgagcct 
tcaggatgga 
gttttctttt 
agcacacatt 
taccctaagg 
gataaagaaa 
gctgggtgtg 
tgtttgagcc 
aaaaatgcaa 
ggataaggca 
accactgcac 
ctaaggaaca 
ataacctctt 
atttatttat 
ccaggctgga 
gccattctcc 
ggctaatttt 
agtgtggctt 
ctaatgtcag 
tgtttggtaa 
agggaatgac 
cagatttgat 
tgtgttgata 
gagcatgcac 
gtgttggtct 
ttttttgaga 
tcactgcaac 
ctggaacaac 
gggtttcacc 
cccaaagtgc 
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72000 
72060 
72120 
72180 
72240 
72300 
72360 
72420 
72480 
72540 
72600 
72660 
72720 
72780 
72840 
72900 
72960 
73020 
73080 
73140 
73200 
73260 
73320 
73380 
73440 
73500 
73560 
73620 
73680 
73740 
73800 
73860 
73920 
73980 
74040 
74100 
74160 
74220 
74280 
74340 
74400 
74460 
74520 
74580 
74640 
74700 
74760 
74820 
74880 
74940 
75000 
75060 
75120 
75180 
75240 
75300 
75360 
75420 
75480 
75540 
75600 
75660 
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cgggattaca 
gtccctaccc 
gccattaaaa 
aagtcagtta 
ttattagtac 
tttgttaatt 
aattgaaagg 
ggaagccact 
cttatgattg 
gtgggtcacc 
cacactttcc 
tacccttcaa 
aagtgagtag 
aaatgcatgt 
ctgtggtggt 
tacttaggat 
tatctgtaac 
attattttcc 
tcctcaacct 
gtatgttacc 
tctattcata 
acatgtattt 
ttagtgcttc 
tgcattggtg 
atttggatga 
taccatggtc 
aatgcattct 
ggatgtgaat 
agtagccatc 
cccttatttg 
atgccaaaga 
atatggaaag 
acctgtgaaa 
ctatccacag 
ggactactgt 
ccaattcaaa 
gaaaaaatgt 
ctgggacttg 
atggatagag 
ccacagctaa 
tgggattacc 
ctttagtcta 
tgaatgaggt 
gaagaaatag 
gcaatctaca 
gaattccatt 
aactacaaaa 
tcatggatta 
tttaacacat 
taaaatttct 
acaaggagga 
ggcagtgtgg 
atacatccca 
tgaacaagtg 
ctttgggtgg 
ccgtctctac 
ctactcagga 
ccaataccat 
aaaagaaaag 
ccacaaaaat 
gtaaacatac 
atgatgaata 
tgtgaaagat 



ggtgtgagcc 
agcccagcca 
tagaattgag 
agatatacgt 
taatattgag 
tttcccccga 
cttttcaatt 
tcgcaccctg 
ggtaagccct 
ctgaggtgcc 
tgtgagctgg 
ccagggccaa 
cctctaagat 
tctcacagtc 
ggtggtttgc 
gaaggatgtt 
attatttgcc 
ttattcataa 
aaggttgctc 
attctgcatc 
ctcatactgt 
tcaagaatgg 
tagttgtgaa 
gcaatggggg 
aagacagtag 
tgaaaatatt 
tttctgacta 
cgtccttcag 
ttggttatca 
gcctaataat 
ggagctccaa 
aaaaatggta 
ttgtgaagaa 
ttttaggcat 
ctctttaata 
tcgtctctgt 
tatttctaag 
gagcagtcca 
acctgctccc 
tcatacatag 
tgatatgata 
gactccttgc 
acaaggcaat 
agttatctct 
aaacagcaat 
gaaacttcag 
tactgatttt 
gaagacatag 
tctcattcaa 
atggagatat 
ggactggcac 
tattggtgga 
tgggttttca 
gtgctggagc 
atcacctgag 
caaaaataaa 
ggctgaggca 
gccactgcac 
gagaggagag 
tacctccaaa 
aagaaaaatg 
acagaaaaca 
attatgaaga 



accgcgccca 
ctgtgggaag 
atctgaagtt 
accataacca 
tgtaactgct 
tttgacagaa 
gcaccagacg 
aatgtgctgc 
gtgtgtgaat 
gacatcagca 
gaacacccgt 
gttctggggc 
aaagcagaag 
aaagagcttt 
agagccaaaa 
cttttaatcc 
cttgtttctg 
aaactgaaat 
caaagcattc 
tgtgggatcc 
gttcatttaa 
ccgtcgtctt 
gtggaaaacg 
atcaccttac 
tgcccctctc 
aaatgggaaa 
gcatgaagaa 
tccagcctgt 
gatagaatct 
gttccccaag 
ggtgctttct 
tgctgacgta 
ggagaaaaaa 
cccccagggg 
actctagcat 
gtcttcatct 
gatatgcatc 
tagaggtcag 
tggttgtctg 
aaacagaact 
aaggtgggca 
ctggagggaa 
cagacaagaa 
atttataaac 
tagaggtgac 
tataaagcta 
aaagctcaaa 
tacagtgaac 
aatctcagtg 
taaggagcca 
tacctgcttt 
cacacagaac 
caaaggcatg 
agttggacat 
gtcaggaatt 
aaaactagct 
caagaatcac 
tgcagcctgg 
gaggaaggaa 
tggatcatag 
atcttggtgt 
tagataagtt 
gatcagaaga 
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gccaggggcc 
ccattgacag 
tatttcccca 
aaatcagttt 
ttgatgggca 
agcagaatgt 
tctgtgagam 
tgggaattgc 
gcgtatttta 
ctcaggccgg 
ctttcctcct 
aacaggagga 
caagattaca 
cctctatgtg 
taattcagtg 
catttggata 
tagattaaag 
gaactgttat 
ctttctggtg 
gtcttccctc 
accagtagaa 
ttttccgtgt 
ttgaaattcc 
ctgattatat 
atccagggtt 
tcccagaaaa 
atctcaggtt 
gcatggagta 
cgtgattttg 
cacaagagta 
ttaagtgaaa 
gctaaaatct 
ctgtgcatac 
gccacggact 
cagtgaatga 
gactactctc 
tgtacaggat 
acgtgagaac 
catgtctctg 
gggtgaaatt 
ttaaaacaca 
gaacctgggg 
aagataataa 
cacacaattt 
agctgagttg 
ataaaataag 
gaactaaata 
atgtcacttc 
gactctttca 
gaatagccaa 
tggggcatcc 
agacagagaa 
aaggcaattc 
acacaatcaa 
ggagatcagc 
cggcatggtg 
ttaaaccggt 
gtgacagaga 
gggagaacct 
acaaaattta 
aggcaaagag 
agatttcatc 
aaacgtttgc 



tggtttctga 
cctgtgggct 
ggtttcaaag 
caaattttgg 
tgtgcaacaa 
cgtcatccag 
cacgactcac 
gcgtggctgc 
aaacaaggca 
cgtgcaccct 
gttggtctcc 
cggggagggt 
aagatgctga 
tgaccaagaa 
attgtttgta 
ggttttatcc 
atagctttta 
tggttctatt 
acagtagcat 
ctcctctccc 
ttataacatg 
tgtgacagag 
aaaagtaagc 
attagtactg 
ttgttttgtg 
taacaattta 
atctggctcc 
ggtgctgctt 
cagtgtttgt 
ttgatgctga 
aggtgaacgt 
atggaaaaaa 
tatatatata 
gtgccccctt 
gttctgtgtt 
ccttccctca 
tccttaccca 
gtactgcctt 
ctcagtgttc 
ttaggttatt 
ttatttaata 
cactcagaca 
aaggcatgta 
tctatgtaga 
agcaagtcat 
tgcaggatct 
tattaaaaga 
ttcccaaaat 
agatacagac 
aacaatttag 
tttcaagctg 
tccagaaata 
agtggagaaa 
gaaaaggaac 
ctggccaaca 
gcacctgcct 
gagatggagg 
gacaccctgt 
cattctatac 
aaggtataaa 
ttcttagata 
aaaattgaaa 
aaatcttata 



tgctggctct 
tgtcttctca 
cattgattat 
ctttctagtt 
agtcattcat 
gttgtggata 
gtgctttccg 
tgggttctct 
ttttgataga 
tgtggatctg 
cgtgggctgc 
agagagcagg 
aagaaacgca 
acattgtgag 
cagatggatt 
tatgtatatc 
aaaatacata 
attactttca 
cacttgttac 
aagaatgtat 
caaaagctac 
gttaaagaga 
actgttcatt 
ctttatgttt 
tagtttcagg 
taagtcttta 
attctccctg 
gccctcactt 
cttcaaggaa 
caactttgat 
tgtccactta 
tgactctttg 
gggttcagaa 
tggatagggt 
ttatttctct 
ggttttggag 
acttattctt 
tgctgtcgac 
tgctagtact 
gtatctcttc 
aacttctcac 
cataagtgaa 
ggttagaaag 
caagtcacaa 
ccagatgcaa 
gtgtgctgaa 
catacaatgt 
gatgtataga 
aaactggttc 
aaaggaaaga 
tggtcctcaa 
gacccccaaa 
ttcagtcttt 
cttcccaaca 
tggtgaaacc 
gtaatcccag 
ttgcaaagag 
caaagaaaag 
cttacacgag 
acttctataa 
caccaaaagc 
gcttttactc 
tctgacaaaa 
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75780 
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75900 
75960 
76020 
76080 
76140 
76200 
76260 
76320 
76380 
76440 
76500 
76560 
76620 
76680 
76740 
76800 
76860 
76920 
76980 
77040 
77100 
77160 
77220 
77280 
77340 
77400 
77460 
77520 
77580 
77640 
77700 
77760 
77820 
77880 
77940 
78000 
78060 
78120 
78180 
78240 
78300 
78360 
78420 
78480 
78540 
78600 
78660 
78720 
78780 
78840 
78900 
78960 
79020 
79080 
79140 
79200 
79260 
79320 
79380 
79440 
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gatttatgtc 
tcaaacaaaa 
gcaaataagc 
ccactacata 
ccagggcagc 
agtttgacag 
cctcggagaa 
ttacttatca 
caaccagctg 
ggtacagcag 
aggtcacatg 
aggagaacag 
caatagcagc 
ttacagaatc 
tcttattgtt 
ctgtaatccc 
ccagcctggc 
cggtggcagg 
cttggaggga 
agagtgagac 
aatggagatg 
ccttctgtgt 
cttcctgtta 
aatttaaaaa 
ctcattcatg 
tcgctatttg 
ttcaggtgaa 
acacatatat 
gcaatggtaa 
ttcacaggtt 
tctgttcatc 
cctttaactt 
ccaccaccat 
ttatctatgt 
tcaagattta 
ctctctctct 
ctcctccttc 
tagcagtata 
gtgtggtatt 
tggttgtatg 
gtttgaattt 
agttcacacc 
ctccttcatg 
ccgagcttcc 
tgaccaaagt 
atttgaacct 
cctgtgactt 
agcagagtgt 
ccagcacagt 
cttatgaaaa 
gggcttaaaa 
aatctagtgt 
atccacattg 
atagaatcaa 
tacattaaag 
ggaatctcag 
gggatcatct 
tcatgttagg 
ctgagggtgc 
aggatctcag 
gctacagatt 
tctcactgga 
agtttagtga 



tggaatatat 
aatggcaaag 
atctaaaaag 
tccattagaa 
tggaacggct 
tttcacagaa 
atgaaagctt 
tcagctggac 
gaccaaccat 
cttgcatgac 
ctgtataagg 
atgagtggtt 
tgtcacatct 
catgtattaa 
tctaaattta 
agcacttttg 
caagatggtg 
tgcctataat 
ggaggttgca 
tctgtctcaa 
tttgagcact 
gctgaagttt 
gcactgaagc 
cataactttc 
gcaaaacgtc 
ttaatagtat 
atatctgttt 
aaatgacaca 
ccgcaatggt 
gctcctgact 
tcmgtcttca 
ttaaataaat 
ttacaaagcc 
cttgcatata 
aagtatccat 
ctctctctct 
taaccctgtc 
aactggcctc 
ctctggaaat 
tggcccagct 
tgacttaata 
tccagtgtta 
actgaattgt 
tatttatggg 
gtccctttag 
ttaaatactc 
gtctaaaatt 
cctgtgattc 
gaggaataac 
ggatctgata 
gctaaatcag 
tttcttaatc 
aacttagtaa 
tgaactgtta 
aaatgctgag 
cttttcactg 
tgaaaaacca 
aagaatcttc 
ccttcaggtc 
gcctgggtct 
ataccttggt 
tcctgacaga 
tggagactct 



aaagaactct 
aaaagatttg 
atgctcatca 
tggctaaaag 
gctggtgcgt 
agctaaatgt 
gtgttcacac 
ctggaaacag 
actgtggagt 
tctcaggggc 
ccatttcttt 
gccgcgcatt 
ttggggcatt 
aacacagaga 
aattaaaaag 
gaggccgagg 
aaactccgtc 
cccagctact 
gtgagccgag 
aaaaaaaaaa 
ggtaggaccg 
acaggctcct 
ttcatcccag 
tctaaattgc 
acaaatgtat 
caactcttgg 
gtgtgattac 
aacagatata 
aaccacgatg 
tgcaccctca 
agaagacaaa 
tgaatagtac 
attctacata 
acttcagata 
atcatatatt 
ctctctctct 
tccaatgtag 
caagaaaaac 
ttcttgtaaa 
agtcttatca 
tttttgagaa 
gcgctactgt 
ctggcagata 
agacaggaag 
actcacattg 
tttatcccac 
ctactttccc 
agtcttccct 
tcagcctgta 
atagagattt 
ctttacaaca 
aaaaatgatg 
gcagtgagtc 
gaataacaca 
ccgctctcct 
gtgttagttc 
ttctcttttt 
aggcacattg 
tctctggcag 
ctgaactttc 
aaaggacttt 
acatttttgc 
gggcaaaaat 
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taatactgaa 
aatagacagt 
ttattgctca 
aaaaaataac 
gtgggaagtg 
ccactcagca 
agagtctgta 
cacagctgtc 
gtcactcagg 
atcatgccaa 
gtcattctag 
aaggtgggag 
ggaattgtgc 
actgtacaca 
aatatctagg 
cgtgtggatc 
tctactaaaa 
caggaggctg 
atcacgccac 
gtatatctta 
ggctagtgtc 
gtaccttcaa 
cttttctatc 
tctttgccct 
gtctgtattg 
gagattgcga 
aaggtaacca 
aatatatgtt 
actctcgctg 
aaaagtttag 
gacgactgct 
aaacataaga 
atttttaaag 
taaacttcac 
atatacattg 
ctggcactct 
ttgggggatt 
actgctgagc 
ggagatttgt 
gaaactgtgg 
gtttattggc 
tttcaggaaa 
catggaaata 
taacacaaca 
ttttgttatg 
tctcacttaa 
tcgaaaccct 
ttttccagct 
ttcagatttt 
aaagctaatt 
aaatgtcaag 
tcatgatgac 
agatgagata 
ctcagttcat 
aaaattataa 
atcaccctgc 
aaccttcagt 
tacttggtgt 
acatttattg 
acggcttgat 
atacttcaca 
agccgagaag 
agcttgtctg 



caataagaaa 
ttactgagga 
cttcagaaat 
agtcgcactc 
gtccagccgc 
gtcccactcc 
cgcgaatatt 
cctccagtgg 
agtcgaaagg 
gttgaatagc 
acaaggccaa 
tagcatctgc 
tgtgttgtta 
catatgcaca 
cggggtgcag 
acgaggtcag 
atagaaaaat 
aggcaggaga 
tgcactccag 
catatctaac 
ttggtttcag 
ctgctgcctc 
ttaaaaaaaa 
ctgtgctacc 
cccttgcctt 
aggctcaggt 
tgatggcagt 
tgtgtgatta 
gcacaacagg 
aaacaagccg 
gcttcttgca 
aatttgagag 
cttagcaccc 
agttccaatt 
actttgtgta 
cgctctctcg 
cttaaaatat 
atgtttttat 
agcagttctt 
cgattttata 
aatttttcca 
gagaataatt 
gaaaaccatg 
gaaaaataaa 
tgttgttcaa 
tttgatgttt 
tttgtggatg 
accactccgt 
aatattttga 
cacttataaa 
gccgctaact 
tattttcttg 
tgtttttatc 
tccgttcacg 
ctcatggcag 
attcctaagt 
tggcagatta 
gtcacactga 
ctcgcacttg 
ttcaaagtcc 
gagtgttttc 
gacgctgcaa 
acttgaatgt 



acagaacagc 
cacacagatg 
atagtgagat 
tagcaaggag 
tttgagaaac 
cagatatttg 
tgtagcagcc 
gtgaatggat 
aatggtgata 
tggtctcaga 
actataggga 
ctctgcagaa 
gtggcaatgg 
cacgagtaaa 
tggctcatgc 
cagttcaaga 
tagctgggca 
atcgcttgaa 
cctgggtgac 
gtgctttcca 
aactaggttt 
tgtacctata 
aaatgaaaag 
tttttttccc 
actgatgatg 
ggcctatggc 
caggtatatc 
caaggtaaac 
agtattgatg 
agtcactttc 
tggcccccct 
aggatagttg 
actttaatat 
tcttttaggg 
caaggaatct 
ctctctcgtc 
tctctttggc 
ttcagggttt 
cagaattaga 
acaaagttca 
tgtttacagc 
tatgtttttc 
ccaggagttg 
gaaattaatt 
gcatagcaca 
cctgcacttt 
ctaacataca 
gtcactctgt 
ttctgaacag 
tacaagtgta 
atcaacagat 
agataatgtg 
agtggtgagc 
cgtctcattt 
aaccagaact 
ctgttcaaaa 
acttcataac 
cactgagttt 
caagctgact 
tttttatcct 
acatgcactg 
ataattagtg 
ggatcttaga 
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aacacatctc tgtcaaggca ttgttttaag gcagtgacta tggtcttaca tttatctcca 83280 

ggacacctaa tttatacttt ttcctgatta aaataatgga ttctggtttt gcccagacat 83340 

agaacccaca gagtttgtct gcttctttca cttgaggtgg ttcctgagca gtgccagagc 834 00 

tcattctctg cggaggctcc tgcaggctgc ggcagcgtgg cctctggccg ctgggagcat 834 60 

gggaagcagg cgctgcggtc taggtcctcc atccccctgt ctgctgctcc tggcaagacc 83520 

ccaaggtgcg catttcccag gttggagccg ctgtgcttcc caggaccata atctgctgat 83580 

tgaggacaga taccaaaaag tgattcatct gtaaaattga gggctgtggt gctgccctct 83640 

aggaggacat ttggaaagat gtggagaaac ctgtgagtgc taagaatgac tgatgttaaa 83700 

gtttgaaaga gtcaaagtga tttttttagt gggagaagac tgtggagtca ccctgagatg 83760 

caaccacagg cttgattaga aataaagttt gatcaccatt ttcaaatttt tacattaata 83820 

ttttttaatt ttcgaaaggt gctaaacaga atctacttaa tgcacctggc acagaaaagg 83880 

cagtgcccgg gtcctaaggc tgcacctttg caagaaagag maatacctga ggcaccggga 8394 0 

gtgaggagga caggtgttgg agaaggctgt agggccccag tatggctgtg tagttcaaga 84000 

cgagggatgc agaagccatc ggactatttt aattacagag tggcagcttt tgtctctgtg 84060 

gcctctcagc aaagaatgga ttgcagggag gtaagaacag ggtgagaagc aggaggcagc 84120 

tagggtcatc gaggtgaaaa atgactgcgg ctgtgtctag agggggggtt gataggtgga 84180 

gaggagagag caggtcggcg cccttcctag gaagatctag tggaatctgt aacgtcaggt 8424 0 

gtgtgggaat ggagaagtca agaagactcc cacccaaatt ttttcctggg gcgactaact 84300 

atagataatg gtgccatttg cagagttagg gaattctggg gcagaagatt gtgtgcaagg 84360 

tttggggtac aataaaaaat tgatgtaggc atattaggtc tgagattcct actggacatt 84420 

caaatagaga tactacatat cagattatat atatgtacat atattcagag gaaaggttaa 844 80 

ctattcactc cagccatggt acctggaagg gagtgtgaat gaagaaatga agaaaacagt 84540 

gagtttaggt ttgatctctg ggctgtgccc tatgcagaag tcagggggaa gggggaggca 84600 

99999acccg ggaacggcta gctagcaacc tgggggagac accaggggaa catggcatca 84660 

gtcagaaggg ggactgtctc aggaaggaag gatgctcagc tgtgctgagt gctgctggaa 84720 

ggtgaataag aggagacaga agccactgtt tgatttcttc aggtggatgt tgtcagagac 84780 

cttgaaaaaa gcaggatgaa tccaatgact aagacagttg aagagtcaat ggtacataaa 8484 0 

gcagtggaag cactagggtt atgtgtaatg gtgcgatttg ctgagttagg gattattatc 84 900 

agacatattg ctgatatgtt attcctagac ataatgctgc tgctacatca gagagattgg 84960 

ttggcagcga atggggcact gtgaagtgtg actcgagcct tctcgtgttg ccaactgcaa 85020 

cacagatcat cgtcctagtg cttggcgatg tggttgcatt atggtgagtt gagtgtggcc 85080 

ttgggaagca tctgaatctg ttggctgagt tatcagggaa aaaaaattta aaaagtaaac 85140 

taagattatg tatattaatg aaaaagttgc tgtatttggc aaatacttta aatggataag 85200 

gctaaaaacc aacaagtcga gagggtactt gttgccaccc atccttttcc aaatcatggc 85260 

cttcaaggat cacactgttg gtctttcctt ttcttttaac ttggatcaac tgtgaagtaa 85320 

cacaggtctt cagtgtagat ctcagttccc caacatttgc cttatgactg agacctccag 85380 

gacgtcaact tggtccatgc tgaactgcag cacaaattcc aagctttgac catacctcaa 8544 0 

ggtgcacttt aacctttgca gtgttctgcc agacatctga actttcactt ttgtttctga 85500 

catctcaatc acacagttct cactgtaaat attaaataat agcacagaat attttaactt 85560 

caggtattca ttggaaaatt caaccatggt ttggttttat ctgtcacttc aaaaactgtc 85620 

ttcagctgtc catcatttag atgtcattta gatgttcctc agggactttg gggacattgt 85680 

taacaatctg ttatttcaag gcttctaaac tctatcccca agttaaaatg atttccaagg 85740 

aacatcatac ttctcttaca gtctgtgtgt aagcaccctc tgtgaattcg gttttaggga 85800 

caatgttagc ttttgaagag agctgatgta agaaatacta gattttagga aactgttgta 85860 

cttttttcaa agctatattt gacgacattg tacattttgc tacctgatac ttttgatgta 85920 

tgatccacct aatgcctttc tcctaaaatt aatttccagt gaattgaata ggaattccaa 85980 

atgaaatgaa tttcatagga aaatctcata cagaaaattt gttaggctgt ccttaaccag 86040 

agaatgagaa ttatgtaatg cggttttgtc agctagagta acagcttgcc ataggttcat 86100 

aatagagctg ttttttagtt ctttttcttg ggttcttgtt tctgaaagaa agtttctctg 86160 

ccagaatatt gaagtcgtgc ctaagttaat aatttaacaa gcattgtata tattaataat 86220 

ataatatcaa taattaatgc tattaatcat taataacaat tatttaatat taatattaaa 86280 

tacttaatat taaattttta gaatattaaa atttaaaatt taaaaaataa aatttatcaa 86340 

aaaaaatttt ttttttactt ttgaagcatt ggttttatta aactttcaaa gtagtatggc 86400 

aaaaaggtgg ccacatacca aatagtgtca tacatttctt aaaatctctc ctagcaaata 86460 

aacttaaatt gagatcatga gtcagttgaa aagacaattt aatttttttg ccatacaatt 86520 

aaagtatttc tgagaagtca gagtgctttg caatgtttgg tgaataattt acacaattcc 86580 

agaataatgt ctcacttatg gagaatacac ctaccactta cttcgataaa cagaagtaga 86640 

gtctatggtt tctttctttt tttttttttt ttttagctgc taaagattat tattaggaca 86700 

gaaggacaat tagctttaaa agcattcctc agaacatgta tttttttttc tagtattctt 86760 

ttttttttat tatactttaa gttctagggt acatgtgcac aacatgcagg tttgttacat 86820 

atgtatgcat gtgccatgct ggtgtgctgc actcattaac tcgtcattta gcattaggtg 86880 

tatctcctaa tgctatccct ccccactccc ccgaccccac aacaggccct ggtgtgtgat 86940 

gttccccttc ctgtgtccat gtcttctcat tgttcaattc ccacctatga gtgagaacat 87000 
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gcggtgtttg gtttttttgt ccttgtgata gtttgctgag aatgatggtt tccagcttca 87060 

tccatgtccc tacaaaggac atgaactcat catattttat ggctgcatag tagtccatgg 87120 

tgtatatgtg ccacattttc ttaatccagt ctatcattgt tggacatttg ggttggttcc 87180 

aagtctttgc tattgtaaat agtgccacag taaacacacg tgtgcatgtg tctttatagc 87240 

agcatgattt atagtccttt tgggtatata cccagtaatg ggatggctgg atcaaatggt 87300 

atttctagtt ctagatcctg aggaatcgcc acactgactt ccacaatggt tgaactagtt 87360 

tacagtccca ctagcaatgt aaaagtgttc ctatttctcc acatcctctc cagcacctgt 87420 

tgtttcctga ctttttaatg atcgccattc taactggtgt gtgatggtat ctcattgtgg 874 80 

ttttaatttg catttctctg atggccagtg atgatgtgca tgttttcatg tgtctgttgg 87540 

ctgcataaat gtcttctttt gagaagtgtc tgttcatatc cttcgcccac ttgttgatgg 87600 

ggttgttttt ttcttgtaaa tttgttagag ttctttgtag attctggata ttagcccttt 87660 

gtcagatgag tagattgcaa aaattttctc ccattctgta ggttgcctgt tcactctgat 87720 

ggtagtttct tttgctgtgc agaagctctt tagtttaatt agatcccatt tatcaatttt 87780 

ggcttttgtt gccattgcat ttggtgtttt agacatgaag tccttgtcca tgcctgtgtc 87840 

ctgaatatta ttgccaaggt tttctatgct atagaaatag catatttcta tgctattcat 87900 

cattaataac aattatttaa taatattaat attaaatagt taatattaaa tttttagaat 87960 

attaaaattt aaaatttttt taaaaataaa tattttatat taaattatca aataaatatt 88020 

aataataatt atttaatatt ataaaattaa taatctttca ttattgaatt attgattgag 88080 

ttaagtaatt aattgattaa ctgataagga ttattgttaa attattgtac tcttgggtag 88140 

tacagagact gcatactgcg ctttgccatg taaatactat tgtctacttc ctggtacgtg 88200 

gctctaggga ggctatggca gagtcaagtg cttttgccct taatgtgaac aaaaaatagt 88260 

gattgctctt agtagccata atatttggtt tattgtctgt gttggtaata atttctgctg 88320 

tgttttcata cagtgaagtg atgtttctgc tgtttatttt agttgcattg gaatttgtta 88380 

tatttatttc tttgttttcc ttttgataag agaagtacgc acttagttat ttataaagat 88440 

gtttggactt cacatgtgag tacagtggtg acatgctggg ttttcctggt cattgcttag 88500 

ctgtatttat aaagtgaata ttactgagca gttaagcctt aacatcgaga atcacccatt 88560 

ttcatttttg aaaactggaa aggattaggt agaatgcaag gagaataaat tgaacttaaa 88620 

tgtttgtgtt caattgaggt gagctttttc ataagaatat tcaagcctag gtcaacatgc 88680 

agcttgtttt ccctctcacc acctggaatt cagtctctat cggtcaatgt cttctaaaag 88740 

ggaaatgggt tcttaactat atacttttag tactttattg cttatcttcc ctttcttggt 88800 

tgaataggct gtgttggata tttagcttcc tgcccctttc tttatgagac agctagggca 88860 

gtgcttttca aaaccttact aatgtgtgga tcacctgggg gatcttactg aagtgcagat 88920 

cctggttcag tgggtctggg tctgctcagg cttgaggtga ggtccacgct gctagtcctg 88980 

tgacccagca ttaggtcccc aggatacaaa atatgaccgg ggatctctgt cgtattcggg 89040 

ggtggagatg agacagcgtc ccaatgatgt tagtcacatg gaacatttag agatgcggag 89100 

tactttgtca gtgttttaca catcgtcaag ctgttagtca agacagtaat cctctgtgga 89160 

aactgtgggt tgaacacttt cagtaaattg ctcatggtca tagtgcttgg aaatagtaaa 89220 

tttttttttt tttctttgag acagagtttc gctctgttgc ccaggctgga gtgcagtggc 89280 

acgatcttgg ctcactgcaa catctgtctc ccaggctcaa gcaattcttg tgcctcagcc 89340 

tcttgagtag ctgggattac aggtgcatgc caccacacct ggctaatttt tattttttgt 89400 

agagacagag tttcaccgtg ttgtccaggc tggtctcaaa ctcctgacct caagtgatcc 894 60 

gccgaccttg gcctcccgag gaactgggat tacagatgtg agccactgca tcctgccaga 89520 

aatggtgaat tttgaatttg aattcagctc ttcctcaatt catagcccac attctttcta 89580 

gcatctactt ccaaagatag cctagagagt attttttatc ttctatagct gtaaaccttg 8964 0 

atatgggcat tctctgatgg cctgtgtgtt ttgaaaagat taatggataa ggcagtggat B9700 

ttcactgcta accttgctac accgtagctg tgtaaccttg ggtaaggcag tttctttatc 89760 

tgtaaaagaa tggaaagatc acctaaataa agtactcagt aaacactcaa taaatattaa 89820 

atatcgttat tattcaacaa gcatttttga cgctgatcac tagccttcat taaaagtata 89880 

acttggatga acgttgaaca caccgagtga aaggagccag acacaaaaag cacatgttgt 89940 

ataattcctt tcagacagta tatccagaat aggtaaatcc atagaataga aaactaatta 90000 

gaagttacca gggatggagg ggagagaggg atggggagtg attacttaac aggtacagga 90060 

tgtttttctg gggtgatgaa agcattttga aactagaaag aggagctggt tgcaccgcat 90120 

catgatataa aatgccattg aattgcacac tttaaaatgg ttaattgtat attatgccaa 90180 

tttcacctca cttaaaaaaa gtcatatatg gaaaatagct ttaaggcacc actacaacta 90240 

ctaaataggt ttgtattttt aaaagaactt tatggaatta taggaagcat ttcttgatgt 90300 

tatgagatgt gttggaaata cagaagaata gcttattttg gaacagatat tattggcttg 90360 

aaattttgcc agttcaagct ggtctctttg gaagactaga cctttatttt ctggcttgaa 90420 

aatgctttgg acataagtac cctattattt tgttgttaaa aattatacta ttgacatccc 90480 

caattttttc tcctgaagtt cagtataacc tagaaataac ttcattgcta cactatttca 9054 0 

ttaactacat gggtgctttt ttagttaata atgatgcata atgtcttcat gtggcagaaa 90600 

cactaacctg ccccttgtca taaatctgta aaaagatgga cattggttta aacccagttg 90660 

ttgaattctg tgcctttaac cagtatgtta cactgtctag ttggggaaga atcccaaatc 90720 

ttcttctttc tttagaaaaa tccaaaacag catacaaact agcaaactct cataaatgtt 90780 
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gtttgagaaa atcaattgcc ctaactacta agacaaagga tctataaaat ctgatgagaa 9084 0 

caatctttgt aatttgattt ttataatttt gtcagcttaa attagtaaaa agttaataat 90900 

tattactttt gttacgctta taataaataa tgtgtttcta caccttccat aaacacctac 90960 

aaccacactt tttaccacag ttggtggagt gaagggtgga tggaggagat agtggcaaaa 91020 

acaccccaat cactttcagt gattaaagta aagatgtgtc taactttact cctaaagtat 91080 

catccagtaa agtggaatgt aaaacatact tttgaactgt ttgaaatcaa ctacattcct 91140 

atggcttacg actgtgggac aagtttctaa ctatcagatt tgatttttaa ttaatcagtg 91200 

atattttata ccagcagtct ccaacctttt tggcaccaag gaccagtttt gtgaaagaca 91260 

atttttccag ggacttgggg gttgggaggg tgagggagga atggttttgg gatgattcaa 91320 

gcacattaca cttattgtac actttatttc tattattatt acattgtaat atataatgaa 91380 

gtaattatac aactcactgt aatgtaaaat cagtgggagc cctgagcttg ttttctgcaa 91440 

ctagacagtc ccatcggggg gtgacgagag acagtgacag atcatcaggc gttagattct 91500 

cataaggagc atgcaaccta gatccctcat ctgcacagtt cacaataggg ttcgcacttc 91560 

tatgagaatg taatgccact gctgatctga caggaggtgg agctcaggtg gtaattaaag 91620 

caatgggaag tggctgtaaa tacagatgaa gcttccttca ctggttcacc caccactcac 91680 

ctcctgctgt gtggccccgt tcctaatagg ccacagactg gtaccaggac ccctgtttta 91740 

cacgatgtgg agtcttttgt atgcaaagaa tattgttgac tttcgccaca cggaagcccc 91800 

cccgccccgc ttcccccgcc tttttccttt ccagttacat tcccacaggt attcttagta 91860 

ccacaactgc agttgaattt cacagtatgg tgggtggtaa gctatggtgg gcggtaygct 91920 

tggataagcc tggctattta gaaatttgga ataaatgtag tgttatgact aacagtaatg 91980 

ttgcctatca aaaattgtga atgttaataa atgttttcaa cacaatcatt aatgctttcc 92040 

agtgagttaa accagcttca tgttacagtt gtattttcca tcccagtagg gagtcattat 92100 

taaatggggt catgttttca agcccaactt aaaatccctc ttacagattg ccttccccac 92160 

cccaccccca gttttctctc atcacttata cattgaaata attgcttatt gttttccctc 92220 

tttaaatttt ttttgagaag tcaaaaattg agtaccttgt tcagtgtttt tgcttatgaa 92280 

atactttgtg aataaatttt gttcttagct gaagaaaatt tcttaggcag ttaagaaaat 92340 

actaataagc taattaatga ataaaaacta atttcattgg tcctgattgg aagtgcaaca 92400 

tttaccgata tttagctata atccttttga tcagtcagaa atttgtaatt attctttgag 92460 

aaataaaaag ttgagagggc tgggtgcggt ggctcacacc tataatccca acacgttgag 92520 

aggccgaagc aggtggatca cttgaggtca tgagttcgtg accagcctga ccaacactgt 92580 

gaaaccccat tctctaccaa aaaaaaacaa aaaaaaagaa aaaagaaaaa aattaaccag 92640 

gcattgtgat gtgcgcctgt agtctcagct acacaggagg ctgagtcagg agaatcactt 92700 

gaacctggga gacgatgctg cagtgagcca agattacacc actgtactcc agcctgggcg 92760 

acagaggaag actgtctaaa aaaaatagaa aaggaagttg aaaacagctt agggaagagc 92 82 0 

tgcaaccact gaccagcacc agtactccat cataatatat gcttttcact tataaggaac 92 880 

tgtaatgtaa actgtggact ttgggtgata atgatgtgtg aacacgggat gactgggtac 92 940 

aacacatgta gcactccagt gggagacatc aaaatgcata tgtggcggca ggaggtgtat 93 000 

gggagctctc tgtaccttcc tcttaatttt gctatgaagc taaagtggct ttaaaaatac 93060 

aaatacagaa aaaaacttgt gctttctata gattaatttg aacatagaca cattaatata 93120 

atagatacat tgatttgaac ataggtacat taagttgaac acttaaggtt tttatgatgt 93180 

cctataccac aataaactga agaagtctgc cttacaaatt tgttcaaaga actctcaatg 93240 

ctctcactgc tccttccctg ccttgaacag gaagtgtcat ccagtgcaat aagggggaaa 93 300 

ataaaatgtg catagcaatc agaaaggaag aaataaagca gtttctattc acagatgcag 93360 

ttcctattta aattcatcag caaggttttg gttttatgaa tgataatatt aaaatgtaaa 93420 

aaacactatt ttcattatgt aatgtgtcac ctacaagatg ctgaattcct gttgcagcgg 93480 

atgctgaatt cactctgccc ttcttataag aaatatgttg ggccaacctt ttgtttttaa 93540 

gtttgcttac agccttacct gtgctctttc aaagtagatt ttcactattt tgaacactct 93600 

attaaggtaa agatgtgttc ggccaatgaa actactagag caaaatgttt acactgtatt 93660 

tctgatttga ttgttttaat acaactgaat tagtgttttc tcctatctct atgcaatatt 93720 

aattcctggg atgtctgtgt aaattaatta atttactgac cagaactcta ctttagcttc 93780 

ttatggtttt gttttcttaa catttagaaa cggctaaatt tagaggacat aaattttctc 93840 

catgagattg tttaaattca gttgactttt taatgtggat tatatttgaa cttgaatgcc 93900 

gcacgcattt ttaatgctgg ttcatggctt ctgtcactgg tacgttgtat ttctcactgt 93960 

actattcttt tacgttgcct cttgtctgaa atgaacttga ttttaacctt ttattttctg 94020 

gtctaattat atgagcttgt ggggagcctc acatattgtt agtatatctc cttaaataac 94080 

atgcattgag gctgaggtca gcagatcact tcaggccaga agttcgagac cagcctggcc 94140 

aacacggtga aacccgatct ctactacaaa tacaaaaaaa attagccagg tgtggtggtg 94200 

ggcgcctgtg gtcccagcta ctcaagaggc tgaggcagga gaattgcttg aacctgggag 94260 

gtggaggttg cagtgagctg agattgcacc actgcactcc agcctggatg acagagtgag 94320 

agtttgtctc aaaaaataaa taaattaaat aaataaataa aataaacatg aattgtataa 94380 

tccagctttg ttattttagc tctaaacttc tggtgtatgg agacagattt tcagggagtt 94440 

tggtcctgga ggagagacgg ctgcagaacc tcaaatatta ctgaattaaa aaggaaaaga 94500 

ttgtattgat cattttaccg tgtggggatt caaatactaa gaggataatg atgatgataa 94560 
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tgatgacgat gaaagcttgt ttatgggaca ttttactctt ccaaagtctg ggaaggaatt 94620 

tcaagtgtat tctggggact tctgaaaata ttagccaatg ttagaaacaa agtcgcaagc 94680 

caaagggatt gcttttgaat ttaggcttgt gatccatctt cttttaattc actgttttaa 94740 

ttaataaaag tctggaatat ttacagagga ttgtttataa aacttcacaa attagaaact 94 800 

tggaattaaa aatatatata taaaatattt catatgtgta aaaacaggat aatatttaaa 94860 

tatctgacct catgagaata atgactcaga tttcttgtta tcgtgagact ttttctcaat 94920 

caacttttta ttaatattca taacgtttat gcaacatgaa gattctgaag ggactttgtt 94 980 

gtctgagaac acatctattt cagatctgcg gagtgtatca ctttttgctg tgtcttcaaa 95040 

gtgattcttg gtttattgcc tgctaaggct aataaatgta taataaatct gcttgttgtg 95100 

tcacttgcag gtgctatggt ctttagaatt gggtcactgg atttctgagg agccgttcga 95160 

actgtctcac cacttccctg cagctcccgt aagtcagatg ttgttttacg atggtaaatg 95220 

cagtttgctg ttctcaagaa attattataa acataagggt ggacttaagt ttttatccag 95280 

tcaagcacaa ttatgcccat aattaaaaag acattcacag aacttaacac cttttatcaa 95340 

tttattcgyg agaacaaatg tgagaacgtg agaccactgt gcaaaaagta gtgaggaatg 95400 

cagtccaaag aaaatttgac gattaacatc ctcagaactg agaaaaacaa aaatgaaaaa 95460 

agactgaatt cttgggcagg tagtcttata tcttgcttaa tgtttttact kttaatagaa 95520 

atagaactga taggtataaa gattatggct tgctggtgct gtgataacag tatttatatt 95580 

tttatggctt tcctaaattc cacttcaact ttcaaatgct tcattgaaaa gttctgggtt 95640 

ctaatttttt ttaagattaa gtaataatta agtggataat ttaaagtttg cttggataca 95700 

ggattgtgca gaagttgcct ttcctgttca aaaatgttaa tttgtttgtc acagtttatt 95760 

cattcaaaag attaatagct gaaagataaa tggtgatttt tatctgccac tggtgttgtt 95820 

atttagctgt ttgagtaggc catatgacta aaacataaca aggagttgaa ctgtgctccc 95880 

tgatcactgt agttatctag gttgttgggt tgttttgttt tcatttttaa gattactgtt 95940 

tgatttcctt tcagctttat aaacattttc ttaaggagag acaaaagctc ctctcagcaa 96000 

aactgtttgt ttgaaatacc gtgtaaggaa ctgaagtgta aagtaaaaac acaaattccc 96060 

cccattctcg ctcataagag attatatatg atgcacaatg acataatgag atttgtcctt 96120 

gaatttttta tcacctgcct acaaagagaa ttgatataaa ttgtgttgtt gccagttttt 96180 

cctgcattar cgtttcccta cctaagtatc catcactctt gtcattgaga tatcctagaa 96240 

acttgttgtt gtctttcgag gctgtgaaat tttcttattt tcagttgttt ttcaacttga 96300 

tacaaggcca tgataccgtt gttgaattca taaaaccttc ttaaatataa agtagataca 96360 

gttctaagat agggaggttc ttaactagtt aaatagttgt tggaaaagtg caccttggtg 96420 

gaaataaaac agagccttga ctttgccaga gtccatcatt gactccaaat atgtagcaac 96480 

acctgtgtgt tctaaaacta cgtcaagtgg tggggagaag ttggggtaaa ataaattaga 96540 

ttttgaaatg gaataaagaa aaaataatgg tagaacactg taaggtgaag acagacatat 96600 

agtagatgct agttacagac tggactctga acttccttgc aaatgattca gaaaagaata 96660 

tatgagaaat tgcctttaaa ttataaagct ttacacaaat gttcattagt attaattgta 96720 

ctatgaaaat ttcaaaagga gttaaaactc caggagttta tggttttgta gtcccgagta 96780 

taaagctgtg ttctcaaatt ttcttttctt tctttttttt tttttttttt tccgagatgg 96840 

agtttgcttg ttgccccggc tggggtgcag tggtgcgatt tggctcactg caaccttacc 96900 

tccctggtgc aagcagttct ccctgcctca gcctcccgag tagctgggat tacaggtgcc 96960 

cgccagcacg cctggctaat ttttgtatta tttagtagag acagggtttc accatgttgt 97020 

ccaggctggt ttgaactcct gacctcaggt gatctgccca ccttgcctac caatgtactg 97080 

ggattatagg tgtgagccac tgcgcccagc cctgtgttct caaatttttg gtaaatattt 97140 

aaatatatta tgaacatcag attttgtttt tgcactttga aacccttttt tttttttcag 97200 

tttgctgatt gacataaaaa aacttactag tgtcaattat ttttttcctt aagtaaattt 97260 

aagggtgaat cttgagacat atagctttgt aaawttctta aatagaaggc ttttctcaac 97320 

cagaaattaa attgtagtct agttctataa aaatatatct tactaggaaa gaaaacagac 97380 

ctctgtttta gaatagtgag aagatagtaa agtttctttg tcatagaatg aaatgtataa 97440 

ttttcctcat cattaaaagt aagaagtttc cttatcacaa ggcacaatta ggtcttttgg 97500 

aaacaaatta taaaattgta aatattatca taaaagttaa acataggcat atcccctaat 97560 

aagttatatt taattactaa aaataccttc atatttaaca atcaggcaga aaaaaatagt 97620 

acggtctgca tataaactaa aatggcacgt ttctgttgat aatttcagag attctggaag 97680 

tttctaccat ataaatttga aatacgtatt tgagcattaa cttataacta agctgtcaac 97740 

ataaatgtaa atacgctgtt tttgaaataa aaatttaaag cacctaagag atggagtaaa 97800 

aatgcactaa ctgtttttcc aaatattaaa cttctagtaa ccccttctca gaatatccct 97860 

gaatatgtct ttttatggct tagagagttt ttttcttcct tttaattgtg atagtgatgg 97920 

tgaattcagg acatatgggt atttacacag tgtataaaca gtgctcagaa gaatgcagtt 97980 

ccaagatgat ctgtattgta taacataagt gttctgtttt ccakttattt actgataaac 98040 

ttgcacataa cattcttggt tgtgacagca gcgtctgtaa actgtcagtc tgattctcag 98100 

cctcgggttc atctttgcat aggtgttctg tctaatcaca attatggatg tttagggtct 98160 

tgctttggtc cgttaagtga tgcaagttta agtgataaag tttacaggct ctaatctgga 98220 

gcatgtgggt cccgtcagca ccgagcacac gccctctgtg gtggaagagg acacagtgcg 98280 

caccgtgact ttcagtgcac tgggcttaag tctttgaaaa tagttcgaga cagttcctca 98340 
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ggtggactgg 
atagcctgag 
ctgtggatag 
gaaagtaaaa 
caatactcag 
tggtgagtat 
cacagttcca 
gtggagaggc 
cagcaggagt 
tttgcagtcc 
gattttttcg 
gaagtctcta 
ctaaagccag 
accttaattt 
cacagtagtc 
gctgggccaa 
cagtggctca 
caggagaccg 
aaattagctg 
gagaatggcg 
ccagcctggg 
taaaccttaa 
tccctcccag 
ctcctgatca 
tatgttcaag 
ctggcaattc 
aagttcctga 
aagaacaaat 
tattgttaag 
aaaaaatata 
ggaatgtttc 
ctgaataaat 
taatttttaa 
aataaagtga 
ttagccttac 
tatatattct 
catacacata 
tatatatata 
tttgttattt 
ttgattcagg 
tattcaggac 
gttgctgtcc 
tggctggaat 
tcctctcccc 
taaatatatt 
catacgggtg 
gaaatggtag 
ctgctgttca 
aaaatcaata 
tggtaccaaa 
tgaggcctga 
ttcatattat 
agacggagtc 
caaactctgt 
tacaggcgta 
ccatgttggc 
ccaaagtgct 
gtgtaaatgc 
aacatgttga 
cagatatgat 
ttcatcttgc 
gaactaactt 
ttcctgtcag 



gatgtttaga 
cctttccagt 
tacattccgt 
cacagaagga 
ctaaggcagg 
aggatgtgca 
ttacggcaaa 
ctgaattctc 
aagcaaacat 
cagcaacatg 
gttgaagttt 
tgtttcagag 
ggcaggagag 
ctgggcttgg 
cccccttatc 
aaatattaaa 
cgcctgtaat 
agaccatcct 
ggcgtggtgg 
tgaacccggg 
cgacagagtg 
attgcatgcc 
aacatgaata 
cttagtagct 
taacgcttat 
agatatgtca 
cttaataagg 
cttatatcca 
tctcttgtga 
cacatagggt 
ccctaaggat 
ctcaaaaaca 
aatagagaaa 
attaacctcg 
ccgtaatgca 
tatgtatata 
ttctttatat 
aaaatatata 
catccaggtt 
tgaatgcaga 
tttgacagat 
atgcaaaatg 
ttgacctctt 
agtctctttt 
gtttccttgg 
gtctaataag 
acaggaaaca 
gctttaaaaa 
ggagggcttt 
tagtagtcat 
ttctttcagc 
tttcattact 
tcactcttgt 
ctccccggtt 
tgccaccatg 
caggctggtc 
gggattacaa 
atcataactt 
ccatagctgt 
ttatgttctc 
actattggca 
gacagcattt 
tgaatttcta 



aatctgctgg 
agtaccattt 
tcaagttgga 
attaggaact 
aggcacactg 
ctggcagagg 
tgcttttaca 
cacagtccta 
ctgaggcaca 
gtgggctgga 
aamctaaaat 
acaaaaagga 
gtgtccagca 
ccaaaaacag 
tgctattttt 
tggaaaaatc 
cccagcactt 
ggctaacacg 
cggacgcctg 
aggcggagct 
agactccagc 
gttctgagta 
atctcctcta 
gtcttggtta 
ttgacttaat 
aagagaagct 
agagaaaaaa 
tgaaattgga 
ctagtttaca 
ttgatactgt 
aagggaggac 
ctgtgttgga 
ataattatga 
acccaagcgt 
ttatttctta 
tagaatatat 
atgtatatat 
tatattcttt 
cctacattct 
ttggacggaa 
tcataggtca 
gagctgctca 
cacaggcaag 
ccaattacat 
ttattaatgc 
cacacccttc 
aagtc'ctagg 
ggatgattgt 
gctcattggt 
attatctcag 
ataaaaggca 
aaatcctcct 
cgtccaggct 
caagagattt 
cctggctaat 
tcaaactcct 
gcgggggcca 
gggtcatcca 
tacctttggt 
tagaaattaa 
gagtttttgt 
gtcacacttt 
aacttttaac 
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tcggatcatc 
aatgccgttg 
aggaccacat 
aggtgatgcc 
caggcgtgtg 
gattgttttc 
agccttcttc 
tttggtaagc 
gttggaaaac 
ctcrctcagg 
attaagtact 
aatttaaagt 
caggggctgt 
gagcatgctg 
gctttctgca 
tagaaatatt 
tgggaggccg 
gtgaaacccc 
tagtcccagc 
tgcagtgagc 
tcaaaaaaaa 
acgggataaa 
tccagtggat 
ttagatcgat 
aatggcccca 
gtaaagtgct 
atcgtacact 
agcaatatgt 
aactaaactt 
gtgtgatttc 
tgctgtaacc 
ggaacacata 
ttgatatctc 
caggtaggga 
ataaaaactc 
acatattctt 
ataaaaacat 
atatatatat 
ttcttggtgg 
gtttgcgtgt 
gtgccttctg 
ttaggctggt 
accactccac 
tctcagtccc 
aattctccta 
tcaaggagag 
tgtctgtggc 
gccaggatga 
gtaataatgg 
gaaccagagg 
tgaaatttaa 
tttgactgtt 
ggattgtact 
tcctgcctca 
ttttgtattt 
gacctcaggt 
ccatgcccag 
tttgtttaat 
tttcctgggt 
accctgccaa 
tgctacttta 
tttcttgtct 
aaatcagaaa 



atggttgtgg 
aacttatttg 
gcatcaaacc 
agctcccacc 
gagtaggcac 
cagccataca 
caccttttcc 
ccacagtgtg 
tctccttcaa 
ctccccttgc 
cagtggagct 
gagagtgtgt 
gggagtgaag 
gggtttgtga 
ctttcagata 
ctataagcag 
aggtgggtgg 
gtctctacta 
tactcgggag 
cgagatcgcg 
aaaaaaaaaa 
atctcctgat 
ccacgctgtc 
tgtcacagta 
aaagtgcaag 
tcctttaagt 
aaggatgcta 
tgtatattat 
tgtaagtatg 
aggcatttgc 
ttgattttac 
cagtatgata 
catatgtagg 
atggcactgg 
tatgccaaag 
tatatatgta 
atacatattc 
atgttgtgtt 
taacagctca 
tctattcaga 
gagcttgtcc 
tcattcatgg 
tttctctctt 
taaatcttga 
ctctcctgag 
agctgggtcc 
tcctccacct 
aggaaacagg 
tgtaacatag 
attgcttttt 
agacatgaaa 
aatgatgctt 
ggtgcgatct 
gcctcctgag 
ttagtagaga 
gatccatcca 
ccctattaat 
gtagtaactt 
gggtaacata 
ttttcctgtt 
aatctttcag 
cagtcactaa 
aataacactt 



ccttgagcga 
tgttctgcct 
accagcctgt 
acgaagacag 
atgcagatga 
cccatgacat 
cttgtgctgt 
tacacactta 
ccaggattac 
tctattaaat 
acataaaaag 
gctcgctcag 
ccccatctgc 
gagaaagaaa 
cctgaagtca 
gggccgggcg 
atcacgaggt 
aaaatacaaa 
gctgaggcag 
ccactgcact 
aaagaatgta 
gcccacttca 
tacatccggt 
tcgcagtgct 
agtgatgatg 
gaaaaggtga 
agatctatag 
tcagttttat 
tgtgaacagg 
tggacatctt 
atatgttaaa 
ctccttatat 
aaaattaata 
cagctcctct 
aatatatata 
tatataaaaa 
tttatatatg 
tatacattgg 
gtgacttcat 
atccttcaca 
aactagagaa 
tccagaccac 
gggctgtttt 
tttgcgtaag 
aagctcagca 
agcatgtggg 
gaccctttcc 
aagcttttgc 
ggaggacctg 
ttttttttta 
attactgaat 
tttttttttg 
cggctcattg 
tagctgggat 

tggggttttg 

ccttggcctc 
gattcctata 
tcatttataa 
ttaatttttg 
attctttaca 
tgtttttcaa 
gtagcgtttg 
tcttttcttt 



98400 
98460 
98520 
98580 
98640 
98700 
98760 
98820 
98880 
98940 
99000 
99060 
99120 
99180 
99240 
99300 
99360 
99420 
99480 
99540 
99600 
99660 
99720 
99780 
99840 
99900 
99960 
100020 
100080 
100140 
100200 
100260 
100320 
100380 
100440 
100500 
100560 
100620 
100680 
100740 
100800 
100860 
100920 
100980 
101040 
101100 
101160 
101220 
101280 
101340 
101400 
101460 
101520 
101580 
101640 
101700 
101760 
101820 
101880 
101940 
102000 
102060 
102120 
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tttttttatt ttttttgagt gaattcttgc 
gatcttgggt tactgcaacc tctgcctccc 
ctgagtagga gtagctggga ttacaggtgc 
tttagtagag atggggtttc aacatgttgg 
tgatccaccc tccttggcct ctcaaagtgc 
ccagaaaaat aacattttct aaaactttat 
ataccaaccc atctgtttta agtgactact 
tcattgtctg ttttatggca gtgtaatacc 
agaagatgac tcattttaat ttgatttact 
taggtgttgc tatatattaa tggaatcttt 
ggctcacacc tgtaatctca acacttctgg 
gcagttcgag accagcctgg caaacatggt 
tagccgggtg cagtggtggg cgcctataat 
tcgcttgagc ctgggaggca ggggttgcag 
caaagtaaga ctctgtctca aaataaataa 
aataaataat aaatgacagc tggaaattcc 
tatttctata atctatatta ctgttgtggt 
aagcaaaatg taattaaacc atctctctag 
gctaagaatc aaatatcccc tctccttgcc 
ttgctgacaa gtctagctcc atatcatatt 
aaaacacatt tacctatatc aagctagtgt 
aaatcacaat ttgtagtcca aattgccagc 
tacaaatagg aagacagaaa gtcatcccta 
atacatttgt cgttgtctcc atcctttgtg 
cttattttgc cggctgtccc tgtaagtcct 
aaaaaacaca ttggctaggg tcattgattt 
attccattct acagaagcgt gttctgtact 
aatattttgc actggtataa acaggaacca 
atgttttcat gaaaccttca acacatatca 
ccctgtaatt tattttcctg ctgctttcag 
ggctcctcct cccaggtctc cagtcatctt 
caacctcaaa cattattttc cgcaggggcc 
gaatggccac tgagtttggg gaagaaatct 
gctacatttc ctcatgtctg tatagtaggg 
aggaacaaat gtcggagtaa gttagacact 
atggggctat gttctgataa gcccattttc 
ggaatacacc tgacctttgg agcatcatag 
gaacactcac attagcccac agtcagacag 
ctgttgttca ccctggggat cacaggactg 
ggcatcatga gggagcatcg tgccacatat 
cccaaagtgt agtttctgct gaatgcgtat 
cttaagttga accattgtaa gctgaattgc 
accagcaagg agcatataag ggaagggaga 
tttttttttt tttttatttt ccataactat 
ttggcagatg gttcaggaca gatcagagca 
gcacgttggg tcctcacgtc ctgatggaga 
ctacaaagca aatcctgcaa tggtggcagg 
ccttctttgt aaaacctgct aatgtttgca 
acagtttgta ttggaagaat acaaagaaga 
gtcgggccag gccaggtgct tacacctgca 
tggtggccgc catctttgtt ttggcactga 
tccgaggtca gactcttaag tcatggaggt 
tgccatgttt caacactggt gcacacctaa 
gaaaagatga gggtagggcc atgtgatgtg 
tcagtattaa ggcagcccta gaaacttcat 
agtattttcc cacacgtttc aaaagtgaga 
tgcaaaatgc taatgtcata aatactcatc 
tttccattgt cttcccatta atcatagaca 
cttttaactc tctcttgtct cctttgaaca 
tccccagagg caagaaaaat aagggagaat 
ttttgtgtaa ttttaggtcc ttttgtggcc 
ggttattaag aatggccatg ttcttgaagt 
ttttcaatct ctgcagcttt gccagggatg 
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tctgtccccc aggctggagt gcagtggcac 102180 
aggttcaagc cattctcctg cctgagcctc 102240 
cagccaccat gcccaggtaa tttttgtatt 102300 
ccaggctggt cttgaactcc taacttcagg 102360 
tggcattaca ggcgtgagca ccgggcccgg 102420 
tcctatgttt gaactctcaa atgtttctga 102480 
acaatggttt ttggcttatg agtgtggttt 102540 
aaacctacaa tacaagaaag gtctcaaagt 102600 
aaaaaaggcg gattaactca tttgtgttta 102660 
tttaaaaaga cagctggggc cgggtgtggt 102720 
aggctgaggc gggcagatca cttgaggtca 102780 
gaaaccctcc ctctactaaa aatacaaaat 102840 
cccagcactg gggaggctga ggcaggagaa 102900 
tgagctgcga tcacactccc ttctggacaa 102960 
ataaacaact ggagactgtg tctctaaata 103020 
ttctttgaac attaaattat tagttggaaa 103080 
tgctacttgg aatttttaac tttttacata 103140 
tatccagcaa gcacaaacgc aggagagctt 103200 
agggctaggt cctgaggaga cacagttggc 103260 
ctcacttaaa acttagtcta aaaaaagtga 103320 
gtctacatat gaaattgtgg acatcgttac 103380 
ctttccctct atgaaatcat tccttgccaa 103440 
cctcctgtta gcatttgtga acatttgcaa 103500 
ctaaaatcat ttcctggttg gctgatgctg 103560 
ttraggtgaa tcctgtaagc gtgcaaagaa 103620 
accgtagtgg caaatttttt gtgatgaaga 103680 
cgttaatgga ctaatgcata ctctggacaa 103740 
acttatcatc aaatccttca gcaaagaggg 103800 
cttgcacaac tatcagaagc gactgtagag 103860 
ataaacagaa gagaaagaaa tgcagcacca 103920 
ccatagagac ggagtcctga gacaactggg 103980 
ccggggggga tggagaatgc agcagacaag 104040 
acagaacggt gctgaaaata aatccttgtg 104100 
taatgtaatt aaacttttag acattgagaa 104160 
atttacaata cagacgatcc ctgacttccc 104220 
tgttgaaaat gttgtatatt gaaaatgcat 104280 
cttagctctg gccttcctta aatgtgctcg 104340 
agccatttgg caacacggtg cacgcagygt 1044 00 
actgggacct gtggctcgct gccgctgcct 104460 
cactagccag ggaaagatcc aaatttaaat 104520 
caccttcaca ccatcgtaaa gtcgaaaaat 104580 
aaaaatacgg cttacatcgg tcatctgtgt 104640 
agacaatatt tttgaggttg ttttttcttt 104700 
gctcaagagt ttctgctgca aagaagcttc 104760 
ggcattcacg taatggggta tgccatgttg 104820 
aacaggcaca cgaagaccca ggcgaggagc 104880 
agaagtgtac ttgaagcacc aagatgatgc 104940 
agctgccaca ttggaataat ataatttcta 105000 
gagaaaatgt tcttttagtt ttacctgctg 105060 
tgcacactgg atgcttataa ccacgtgcag 105120 
aagtcactga ggttcagaga tataaacttg 105180 
aggatttgca ccagatgcag caaatgcctc 105240 
acagagatgt ttgtttgttg aagaagttgt 105300 
gagttccgta agtgttgctc ctaagtgact 105360 
cctaaggcat gaactggaca tgtgagtctc 105420 
ctggccgtag ctcagtctct aaatgcctgc 1054 80 
tctgttggga ttttgaaaca ctgtactttc 105540 
ggattgagat gaaccacttc ccttgcttat 105600 
tgtttagttc tcatggaact tgttaaatta 105660 
actatttttt atgagtctct gttagaaagg 105720 
cactggttta aagtgctttc tttaaaattt 105780 
tgctttacat tggtatgggt tgattttttt 105840 
attttatata acagtggagt aaagaggtaa 105900 
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catcaacatt 
tgatgaattt 
tctattgatg 
gaacatttct 
catttgcagt 
ttttattagg 
tttaaaccaa 
ggaaatatct 
gtaatgaccc 
ttcttttttt 
aaaatattta 
gttttgatct 
ttcatctcgg 
atgcttcact 
gaattttcat 
ttgttcaggc 
ggcagcctgc 
ggagtgtctg 
tttttgtgat 
ataatgtgtg 
ctggcgttgg 
tttcggagtg 
cttatcttgc 
attttagagg 
tgctgccaaa 
tcctgtagag 
gacaactaat 
agactaactt 
gctattcatg 
atgaaagcca 
caatggacaa 
ttttaaaccc 
tctcattttc 
cttttctttt 
cgatctcggc 
ccgagtagct 
gtagagacag 
gcccacctcg 
aatcttttaa 
cttgagagat 
tgtattttaa 
caatattctc 
catgtaattt 
tatcaaatag 
atggatttta 
aggataatga 
catgggaatg 
taaaaataag 
ctggagcgca 
tctcctgcct 
atatttgtat 
tgacctcagg 
agcacgtctg 
aggggagact 
tggtggttaa 
tgccttaaag 
gccatcctca 
ggcctgcaaa 
gcagaagcag 
tataaggcac 
aaagtgatgg 
agaatatgac 
ccaggagatt 



aacaattaaa 
ttaagattta 
catttatgca 
tctgggaaaa 
tatgagaagt 
ataataattt 
tgtgtgtgtt 
gttaagaacc 
attttgtaaa 
tttttttttt 
acttggaggt 
ctttggagtc 
ttgttgcctc 
gatgtttaaa 
ccactgtttg 
aacttgacat 
tcaaggactt 
acttgggcat 
gttcgtgttg 
ctccatgatc 
tgccagtggg 
cctctatgtg 
acactaggtt 
cagaaagtaa 
tgggtgagga 
gagagcagag 
cttgactttg 
ttgaaacctt 
tgacatatta 
attatctacc 
aatactgttt 
agagaaactt 
tggaatagtc 
ttttttgaga 
tcactgcaag 
gggactacag 
ggtttcactt 
gcctcccaaa 
ttttttcttt 
agaaatgttc 
gcacrtagcg 
ctgaagggtt 
tttctttgta 
cagaaattcc 
aataggtcag 
gaactacttg 
tgaagcacca 
ctaagtagtg 
gtggtgtgat 
cagcctcctg 
ttttagtaga 
taatccatct 
gctgcagctt 
gaagcaccaa 
ggttccgcag 
aatgtcctta 
cgtcgctgaa 
caggaatgca 
gaggaatgta 
ggagtagacc 
attaaaatga 
tcacttgtca 
gctgtttttt 



cctcagtgtt 
atgtacgcat 
tttcttataa 
aaaaatccct 
ttcagtccct 
tcgatgtaat 
tttctaaatg 
atctcagttt 
aatcttytta 
taattcttct 
aaaacactga 
aagttggaac 
tccaggagag 
gagtcacatc 
tacagcagga 
ttagccgctt 
taacattgtg 
taatgagatg 
accttcattt 
acgaggcgcc 
cacacagtct 
ccttgtgtac 
gtcaagtccc 
attcccagtc 
caaggtaaac 
ttgactaacc 
agattgaaga 
ttggaataaa 
tcatgggaac 
aggcagagtt 
gtatgaagtt 
aaagaagtaa 
acccagcaaa 
cggagtcttg 
tccgcctccc 
gcgcccacca 
tgtttgccag 
gtgctgggat 
caattaccat 
atacaatgag 
ttgctgrtta 
accaaatccc 
tatttgaagt 
ttctggtgct 
ttattagata 
gggagccacc 
ttaaacagtc 
gcagccttgt 
cttggctgac 
agtagctggg 
tacagagttt 
gcctcggcct 
ttgttttgat 
ttttaaaaac 
atcttgaaag 
ccactttata 
taattgtcca 
gggtataagt 
gaccctgagt 
atcggggttg 
tggtaaaaca 
tcatctgttc 
tagagattca 
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atataaaact 
agcttttagt 
catgtatttt 
tacatgctgc 
tcacagttct 
atctatttta 
gcctatccaa 
aaaatatttt 
tacaaacagc 
gttgtagatt 
ccaaccactt 
tgctgtgagg 
cctgatcttt 
catgtatatc 
catactgggc 
ctccgtgctg 
tctcctttca 
aagacgagac 
gctaatttct 
accagtctgt 
cacttctctg 
cagcattact 
attgctgttc 
aaggttgccc 
acccagggaa 
atcccaccta 
caattgaatg 
acagcacagt 
acttactatt 
cttctaggct 
tcttttatat 
tagtttagat 
ccttttaatt 
ctctgtcgcc 
gggttcacgc 
ccacacctgg 
gatggtctcg 
tacaggcgtg 
gaactcactt 
taagcctcat 
gtcagttgcg 
tgtaatgcaa 
ggatagtccg 
gtgacaattt 
ctaagctgct 
agcagaagcc 
ggcttaccaa 
ttatttgaga 
tgtaccctct 
attacaggtg 
tgttatgttg 
cccaaagtgc 
acagtttacc 
catgtcaaaa 
ctatttttca 
ttctttccaa 
cccgcctcct 
gacagagccc 
gcaggactca 
tctaagaaac 
taaacagtaa 
caggctaaag 
ttatgaatgg 



gccagaatgt 
ttcactagaa 
ccagttttcc 
atacaactgg 
cttatcatgc 
tcttgccaag 
aaattgattg 
tataatgtca 
ctaatcctta 
cctaactgtt 
gtgtctcaaa 
cccaaacacc 
ccataatgag 
tgtttctcaa 
attgtagagt 
cccaccacaa 
gactgttcag 
tgtaggtcag 
gacctcaaag 
gctctttaga 
cccctcccgt 
gtgcatgtgg 
tctctctcta 
atgctattac 
tgctgtgaat 
actctgccat 
tgtttaaact 
cacaagtatc 
cactgatttt 
ttgaagatac 
tgttataacc 
cttggttaaa 
ttttttttct 
caggttgcag 
cattcttctg 
ctaatttttt 
atctcctgac 
agccaccgtg 
acctataatt 
tcccttccca 
aaacaaactc 
gttgttaaat 
tcaacttaac 
agggtccttc 
gctggaagaa 
ttggcataaa 
aaaaatgctg 
gtcttactct 
gcctcccagg 
tgcaccacca 
gccaactggt 
tgggattaca 
ttatattggc 
gtcattggtt 
caagggaaat 
gtcctctgaa 
ccagcttcca 
ccccactccc 
gccgagaggg 
agatggtttc 
tataatataa 
accccaccct 
cacattttgg 



gtgtgaaaag 
agaatgaaat 
aacacttggg 
cgtctcaaag 
tagcatcatg 
caaattaagc 
catttctaaa 
gcrtacaagg 
atttttgtgc 
gccagttgaa 
attcattgaa 
tatcttctca 
aatagtgaat 
acatgcttct 
tttcagttgg 
tcctcccctg 
gtcgtggagc 
atgatgactg 
tgggtatttc 
ctcctttagg 
tgcacacaca 
cttcaccgta 
ctctcatggc 
ttatgattat 
ctgatgtatt 
ctctaaactt 
tcataaagac 
catcatttat 
acaaatacct 
acagtaaaca 
aaagttagaa 
tcattgtgtt 
ttttcttttt 
tgccgtggca 
cctcagcctc 
tgtacttcta 
ctcgtgatct 
cccggccaca 
gagttcttca 
gtctttaagg 
atttcccagc 
tcaattattt 
ayagaataac 
ccaaaggaaa 
aacttgtatt 
cagctcagtt 
agtccacctt 
gttgcccagg 
ttcaagcgat 
cacctggcta 
ctcaaactcc 
ggcgtgagcc 
cattctttaa 
agtttgggat 
tctttyctga 
aatcaacgct 
tgtcacagta 
cccttacgta 
ttctctggga 
aaataaattg 
agtgttgatg 
gatggctggt 
cagattggcc 
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106140 
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106260 
106320 
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106500 
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106620 
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106740 
106800 
106860 
106920 
106980 
107040 
107100 
107160 
107220 
107280 
107340 
107400 
107460 
107520 
107580 
107640 
107700 
107760 
107820 
107880 
107940 
108000 
108060 
108120 
108180 
108240 
108300 
108360 
108420 
108480 
108540 
108600 
108660 
108720 
108780 
108840 
108900 
108960 
109020 
109080 
109140 
109200 
109260 
109320 
109380 
109440 
109500 
109560 
109620 
109680 
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ccgaacccca 
tttcactaaa 
atcatagggt 
tatgtgcaat 
agtgtcctgg 
gacttcacgt 
tatttctgct 
cttaagctaa 
aaacccctga 
aacatgtttt 
gttaaggttc 
gaataatgga 
tagaatccta 
ttcctacatg 
cagtgcaacc 
tggttctggt 
atgtttggaa 
ctcattttaa 
ctctgtattg 
ccagatactt 
ctgcattttt 
ctaagtgata 
tgacatcatg 
tcaacacagt 
tcaacaaaat 
taaaccagct 
ctgtatccta 
tccttgttcc 
acttaaatac 
gcaaaactct 
tattcgcttt 
catccaggtt 
tggactataa 
gatactgagt 
atacttccca 
atctgtccac 
cagcaaataa 
gcaaaccata 
accatacatg 
aatttttcaa 
caattgaaca 
cagagaagtt 
tcccaagttc 
cattgttctg 
gccacaaatg 
tgtgacttac 
tttttcaaga 
gtttctttta 
gaggctcata 
ttctgggggc 
aacatagatg 
tccatggcat 
ggccaagaac 
ccagatagca 
cagcagaccg 
cgttagggac 
gtttctttgg 
gatacatgtt 
tgtttttctt 
ttccaagagc 
ttgtgttggt 
taatatatac 
tcattcttcc 



catctccatc 
tgccagcccc 
ttggaggact 
actaatcaga 
tgcaaggtgt 
ttctctgtac 
cttaaactgg 
gcgcatgcat 
aaataagatt 
ccagaaagta 
aaaaccaatt 
ggcgatttgg 
aagaggagag 
aggaaagagg 
agagaactgt 
ctctctccag 
agataggtgc 
attaatctga 
ctatgatttt 
tgtgtatatt 
gaaaggagaa 
tataggattt 
ctgccaaaaa 
tccaagttta 
ggtcagtgct 
gaataagcta 
aatggtatgc 
aagaatgttt 
atgttctgtc 
taaatgtctg 
gattccatta 
tttagaaaaa 
aacttgcaga 
aaaagtgctg 
agaccttacc 
tggtgcacgg 
tacttgactt 
gagacaaacg 
cttttgataa 
aactaaagct 
ttcttttctt 
gtctggtagt 
taccccaggg 
ttacagatgg 
gcagagctag 
agagtcttaa 
tgagaaaatt 
ctgcaaatgt 
taatcccaga 
catggggtgg 
aaacaatgtg 
ccccatttca 
acgccaacat 
aaagagattg 
tgttgtttgt 
attaactaat 
aaagcattat 
actgtttcaa 
tgatactgtt 
ctcgaaaatt 
tcaatatgta 
agttatccag 
ctatcaaatg 



ctgtagaaaa 
tgcccctcca 
cctgttacat 
ttttttggtt 
gtgacctggg 
atagctcact 
ctctgtgggt 
gctcaatttc 
acataaatag 
agataaattt 
gtaaatattc 
cagggtaagc 
cgaaaagaga 
ggtgcagagg 
ccctggaaca 
gaaggtcacc 
cattaatgaa 
tgtagaaaat 
tattatcatt 
atctataatc 
aagatttagc 
gaataaggtc 
tagagaaacc 
tatcagttca 
gctcacttct 
ctttggtttg 
tcttgggttg 
gattcttcag 
ttgttcagag 
agagtcactg 
gtgatgatga 
agaacagagg 
gaaagatagt 
aaattgatta 
actccaatta 
tgcaggaacc 
gcttcaactc 
ttaggagtca 
atgccatctt 
gcagtatttt 
gaattagata 
ctttgaaggc 
gctgtccctg 
ggaaactgag 
aatttacatc 
gtctgcaagg 
gaggtttgtg 
ttaaagggaa 
gtactatata 
gacagtgcac 
atggggctgc 
catgcatcct 
ctggtgacag 
aaaggcttag 
gggtttgttc 
tttcagggtt 
gttgaatagc 
gttccaggtc 
ttaaatactt 
tccactttgg 
gctgctttta 
taattacata 
tttaagagaa 
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ccattgactt 
cagagagcta 
tctcgtactg 
cacatctgag 
cttggcttag 
gcccacagtt 
ttaaactttg 
aactcagctg 
ttttaaataa 
tctgccacat 
ttcttgtgtt 
ttcaacgcac 
actaagagag 
aagtgcagtg 
gcctggtctg 
cggctcgtat 
agttacattt 
cagtctgcac 
gctacacacc 
tcacagcaac 
aaagttcgat 
tctttgattc 
tgaccctttg 
tgaataatac 
atcaatattt 
aggattcatg 
aaagcattga 
cgtcaaaatc 
aacaagttta 
ccagcacata 
taaggttatt 
atttgtaaaa 
gttcaaatag 
tatcaggatc 
caacaaacct 
tgatatcttt 
tgaggcaatg 
ggtgtcctgt 
gcagtctctc 
ggcctttcag 
aatgagtgca 
agcgaaaatg 
tgttgtcaca 
ctctcagaaa 
caagtctgtc 
gaatttagaa 
aaagcttatt 
tataaatcct 
ttttcatctt 
tggggtggga 
atggttagcg 
ctgtcccccc 
acaatggcat 
cctagagtct 
ccttccttcc 
gatttaacac 
acaatattta 
tttaaaaccc 
aggataatac 
tacagtatcc 
catttatatg 
acacttcacc 
gccacattga 



gtgttcacag 
tgtgaagggg 
tggggctgtt 
cacaagggcc 
caccttggcc 
tttctctaag 
tccagaacac 
gagttttatt 
taaagattac 
acaaggtagg 
gtgtcccatg 
actgttctct 
tgttattcct 
gctgcctgat 
caggggactt 
tgccttcccc 
attaagaaga 
aagctaaccc 
actctcatcc 
tctgctgtag 
tttgatcagt 
ctgttatgtt 
gtaaataggc 
tgcctcttat 
ctttttaaaa 
tacatatatt 
cgtggctctt 
actgctatga 
tcttgttatg 
aatgatttga 
agaacatttt 
actggagtat 
agttatctac 
agcaaagcag 
aagggcagtt 
ctgtaaagct 
attaagtgac 
gaaatttagg 
tctcgtagaa 
gaaaagatct 
gaatcgggtc 
gtgaccacta 
acccccaagg 
ggttaaatga 
tatctctccc 
gtaacaattg 
gttccctgct 
aatgtttcca 
tctgcagaat 
agcttgcact 
gctgtccctc 
tcttgacact 
gcacagaatt 
gttcgttgcc 
ctgttgtatg 
agcattaatg 
tcttttccgt 
taatgcttgt 
tcaatttaaa 
actgtattct 
caaacatatt 
acactgattc 
aatattctcc 



ccttgaaacc 
atactctttg 
tggcttcctt 
ttgaggctcc 
tcaagggcat 
caactctttt 
agaattcttt 
atgaaattaa 
agtagaacga 
aaatattgaa 
gtctttttga 
gtttacgagt 
gttctttcat 
gcctcattgc 
ctccccagca 
agtggcactc 
aaattatttc 
cttgttaccg 
aagtactctg 
tatcataatt 
tactcacctt 
ttttttccac 
caaaagtccc 
ttgcctgcag 
atctatttca 
gaagttaatt 
ggtgagccta 
agttaccagt 
gaagtcagag 
gccatatgag 
cttagtactt 
tatggttaat 
ccagccagaa 
aagtcctcag 
aatatcttta 
tgatgttttt 
gggttaaata 
gaaggaaatg 
gaaccaaatg 
gctcaaagac 
tcctgccagg 
ccatttactg 
ctgagggtat 
ctcgcccaga 
tggatcagat 
acaccactta 
gtgtaaatga 
accatgacct 
atttcagtta 
tagactgaga 
tgcatcggtg 
cctgtcccca 
gtcatgaaaa 
ttttcatctg 
gcttctaggg 
aattagaaag 
tcataatcaa 
atttttaaag 
gagataatac 
ctgtagttat 
tatagacatt 
tcctgtaata 
ggaagggttt 
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111600 
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111900 
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112140 
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112320 
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ttttttttcc ttatctaaag ttcagtgtct 
gagtgaggct gcagaactag gaatgactga 
gagtactgac tgtccactca cgttacatgc 
ccgctgctgg agaagaaaat gaacactgtc 
cgttagatga ggttagaatc gccttcccca 
tgcgtgtccc tcagaccatg tcaccaagtc 
gacaccaccg gtctgtggca ctgtgttagc 
ggtaaaactg ccaccatccc cgtgtgaggc 
tatgcagtca cacacaaaaa agaaaggaac 
agcagtagaa attattaatt tcatgacacc 
cttcagtttt tggaagatac tttctttaag 
ctgacagtat gcattattca agcaaaatgc 
atagacattt ctacatcatt catatgcagc 
agagcccttg gcaagcactc taggcgaggt 
gtgaggctgg ggaagatctt tggggagaga 
gtggcctgcc tagagccagg gagttagtaa 
ctgtagaatt agggaatgtt aacgtgtaga 
cttcagcaca gtctctgaag ctgatttgtt 
cttttaaaaa atagtaaggc atttaaacgg 
caasagccaa tcatttggtg agttttatta 
taccactttt ttaacctttc tagaaagttt 
aaagattaat acttactttt tggtcattaa 
caaacttcct ttgctgatga tcaattacat 
taaattatta cttttgaaga tctttcatct 
gaatttaaaa agaatgtccc taaacactgt 
ttcaggccat tatcccactg aggacatagt 
atcctcctcc tcccaggttt aagtgattct 
gcaggtgccc accagcaagc ccagctaatt 
catgttggcc acgctggtct ccaactcctg 
caaagtgctg gaattagagg cgtgatccac 
agacgctgca aagtgaaaca ataataagga 
taagggaact agcatctctt aagtgccagt 
ccatgggtct gatattattt ttcccatgta 
ttagtaattt gcccagtctc atccttctaa 
tttactgcca tgaatcaaaa gtatgcttgg 
tatatttccc cttctcttct tccttcattt 
gagaaatagc gccttttctg aaaggtgaga 
taaaggttat taacgctgaa gaaagcatga 
atagtcagta agttattaac tgtctccatg 
ttattcaccg tggcagtcac tatttttttt 
gatactaaca cctgtagctg atctttctct 
ttgtcttcca tagctagcac cttcttttct 
tagtgaaggc tattctaatg aaatatttta 
ctctgttaag atattgtagt ggttataaag 
ctctgaaact tgctgagaag attagatata 
catatatgta tgtacatgca gacttatgca 
caggataggt gggatatggg tgttttttgt 
gctgctcttt caagcttagt gctcatgaag 
ggctggaccc ttgggtagga tattagcttt 
cctaatgata ataatagcac ttaatgctat 
tacttctccc agactcaagt cctggcttac 
tcaccctatg cgttaatgtc ctcacctgtt 
ttggtgtgag gaacaaatgg gttttaaaat 
cttgtaatcc cagcactttg ggaggccgag 
agtatcctgg ctaacacggt gaaaccccgt 
cgtggtggcg ggcgcctgta gtcccagcta 
atctcgggag gcggagcttg cagtgagctg 
acagagcgag actccgtctc aagaaaaaag 
ctgtcataca ttaacactca atgaatgttt 
atttccaatg aggaaattga aacttaggga 
tataatccca gcactttgga aggctgaggc 
agcaggctgg ccaacacggt gaaaccctct 
gtgggggtac atgaatgtaa tcccagctac 
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cccaaagcac cttcaggagt caggctctct 113520 
ggtaagctgt gttgtggctt tgcctgctgt 11358 0 
agtaattgga catatgcctt gaagtgaact 11364 0 
tttatggtgt gtttcgagtc ttccagactg 113700 
gcggttctca tggtgtggac ctcagaatcc 113760 
cacaggttga aactgttttc acgatgctaa 113 820 
gtttgctcag atagagcaaa aaccatggtg 113880 
agggcagcgg tagctaactg tacaagtcac 113940 
aataatatca ctgaaaaatg actttgacac 114000 
tctatggctg atgcactatg gttgttcaac 114060 
gaaatgagtc tgacacctca ggaaaaacaa 114120 
aaggagaatg tatttgttgc cgaagataag 114180 
ttttgtattt tctgcacagc agcggcactt 114240 
agctgcccag taactatggc tttgtactgt 114300 
cttctgctct ctggttcctc actctctaaa 114360 
ggggagacga atacctcacc ttgatctctt 114420 
tgccattcgt ggtgtgtcct gatttgaata 114480 
cttctttagc aacagtgggg tccttagctg 114540 
agttcatgaa aagacaaaga cttgttattt 114600 
ctttggaatt cttaagtaag caaaaggctg 114660 
cctttcagcc tgttttcttc ttaattctca 114720 
ttccatgtaa ttaaaatact tcaaataatc 114780 
gtaatgaaag tacttcacaa tcacataaat 114 840 
tgagtagaat agggtaaact tagtatggaa 114900 
tatctgtatc atgaccccat tgcctgcccc 114960 
ggggtgcagt gacacatctc agcttaccgc 115020 
cctacctcag cctcccaagc agctgggatt 115080 
tttgtatttt tagtagagac aggatttcac 115140 
tcctcaagtg atctgcctgc ctcagcctcc 1152 00 
catatctggc ccttccctcc aatatataag 115260 
aggcaaaatg tgcttaagaa cctggcaaga 115320 
gtattatctc atttaatctt aatggccatc 115380 
aaacctaaat aaatgaatat cggctgtggt 115440 
ttaatgatgg aactaactaa aagtaggctc 115500 
ggtgtttgct tcataataat tagtataaca 115560 
taattggtag atatttcatg tgaaatatat 115620 
attttttagt cttttgagtg ttttactgac 115680 
tatgtraact tacagtttga tgtggacatc 115740 
agatcatgtt gctgcttctg aagaactgaa 115800 
tctagttctt caatgatgga attttgcttg 115860 
tcttttattg actgtagttg gatgatgtgc 115920 
aggaaacttg taaggaaaag aattgttagt 115980 
tatttattga atttctactt ctccaaggta 116040 
taatatgatc ttaccagagc cctaaggaat 116100 
taaatgtgtg tatatatgta aacgtataag 116160 
tacacacaag aaaaggtacc ccatctggtc 116220 
attagatgct acagcgctca gaagaaaggt 116280 
tgcttttttg agaagggaga gtttcaactg 116340 
ctcctaaact atttatattt taatattaat 116400 
gtgagaaata ctccttcatg gggaggtgaa 116460 
cagccctgcg acttggaaca gtttacttag 116520 
aattaggata ctatcaccta cgtcatgggg 116580 
gtaaatgctg gccgggcgca gtggctcacg 116640 
gcgggcggat cacgaggtca ggagattgag 116700 
ctccactaaa aatacaaaaa attctccagg 116760 
ctctggaggc tgaggcagga gaatggcgtg 116820 
agatcacgcc actgcactcc agcctgggcg 116880 
aaaaaaaaat gtaaacgctt agactagcgc 116940 
gttaacgtta atatagacat tattattccc 117000 
cattgagggc caggctcagt ggctcacacc 117060 
aggtgtatca ctagagtcca ggagcttgag 117120 
ctctactaaa aatacaaaaa ttagccaagc 117180 
tcaggaggct gaggcaggag aactgcttga 117240 
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acccgggagg tggaggctga agtgagctga 
aagagcgaga catcgtctca aaaaaaaaaa 
aatcacatac ctaaagtcac acacagcagg 
tctgactctg aaatctgctt ctctcctttt 
taaccagcct atcgcatgta cttaatacat 
aagctttttt ttctcttttt ttgagatgga 
agggtgccat cttggctcac tgcaaccttc 
cagtctcccg agtagctggg actacaggcg 
tttttagtag agatgggctt ttaccgtgtt 
tgatccgccc gcccctgcct cccaaagtgc 
gcagaaaagc tttttaaaaa ttatttagag 
agacacttta ttaatggtta tatagtttgc 
gctataaata tgttcaatga agagcatacc 
taagccagag gaaacaaatc caagagagta 
tcccaggaac aaactcaaag acatgcacag 
tgttttttgt tttgagatgg agtctcggag 
tggcgcgatc tcagctcgct gcaacctccg 
agcctcctga gtagttagga ttacaggtgt 
taagtagaga cgtggtttca ttatgttgtc 
ttcgtccacc tcggccttcc aaagtgctgg 
catttttttt ttttttaata agatacaaga 
tcagcaccta aagaggcttt ctgtgataat 
aatatcttca ttttgtttgt acaaggccag 
taaaaataga aaaacagtga ccagatgtca 
ataaagttta gtacacatga atttgcacat 
tcatattccc ttttttgagt cccgtataag 
caatgatggc atctttatgt ttcagaatta 
tctgttaatg tcacattaga agctggtgaa 
gaaggcatat gaagagcaga gaaacattat 
accggccatt taacactgct gtgagttatc 
ttcctaggaa ccgaaagcat gtgaaattga 
tcagatcacg tcttaccttc cgtttaacag 
tgttttttag ggttttggag aaaaatcaag 
gcctgcggaa atttagatgt tttgatggtc 
atgaatgtct cccttgagga aaaactaccc 
ttatcacagg gaatacatat gaagatcatg 
gaagtcacct ttggtgtcac gagcaccttc 
tcccacctct gcacatggct tcctgtgtca 
gtgcatgtct gtctaggtga atatctatct 
cgatttcatg gcctctcggg ccccttttag 
tgacagtaac attttctgac tctttaaccc 
gctacccatt acatgtctcg cccataagca 
atattcaatt gtagctctta aatgtattcc 
attttggaga gatgggggtc tgtctttgtt 
agcgatccac ctgcctcagc catccaaata 
tggctaattt ttgtattttt tgtaaaaatg 
actcctggcc tcaagaaatc ctatcactct 
gagccactgt gcctggcttg aatctattct 
tataaaagat tatttaaaat gtggtttgtc 
aatgatatac ttgcaaataa aatcataggc 
tgcagaactc atacacctgt aaaatcatca 
tgttgatgaa aaaaaccttt ttatttcctt 
acatctgatg agaaaaagct tattcttcct 
aaaactattt gcaaagatga tatttggggg 
attgtagcat gctaaatttg aaacccaagt 
tctacaaact ggacatatcc taggtttgtc 
aagtaatctc cagctcccat gtgttccggg 
ctaagtaaga ctattaagaa aacgacgcca 
ctttaggatg ctgcggtggg cagaaggctt 
atatggcaaa accttgcctc tactaaaaaa 
ggtgtgtgac acatgcctgt agtcccagct 
gagcccagga ggttgaggat gcagtaagct 
aaccagagtg agaccctgtc tcaaaaaaaa 
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gattatgcca ttgcactcca gcctgggcaa 117300 
aaagaaaaga aaagaaatat aggaagaatg 117360 
tggcaggggc agaatacaat cccagcactt 117420 
aatgtggccc cattccttct ctaaaaaatc 117480 
aacagttaat atgtgagcca agcccttgaa 11754 0 
gtctcgctct gtcacccagg ctggagtgca 117600 
acctcccagg ttcaagctat tctcctgcct 117660 
catgtcacca tgtcaggcta actttttgta 117720 
agccagaatg gtctcgatct cctgacctcg 117780 
tgggattaca ggcatgagcc accacgcctg 117840 
agctggtaaa attatgccat gtaagtccta 117900 
cttcctaatt tcaacttata aacatacgtt 117960 
acttttaaac taaaaatagt tcctgtccat 118020 
gagactatgt atttgagaat gttaactgtt 118080 
tcaaggtatt tggcagggtt ttttgttttt 118140 
tctcgcgctg tggcctgggc tgttgtgcgg 118200 
cctcccggat tcaagcagtt ctcctgcctc 118260 
gccaccacgc ccagctaatt ttttgtattt 118320 
caggctggtc tcgaactcat gacttcctga 118380 
gattacaggc atgagcaccg tgctggctgg 118440 
ggaaaattgg atagcctgac actacattat 118500 
tgcaggaaaa gcagcaacta aagatgtttc 118560 
taaataaagc tttcaaaata tagacacttt 118620 
gattcctctc tctgacattt tccttccaat 118680 
tgcagagttt tgttttaaag gaaggggacc 118740 
tcagctatct tatttaataa tgaaatatgt 118800 
ttttctgtct actaacaagt taccacagct 118860 
atattctata catttcacta gcttttctgc 118920 
tttcccacct gcttgataaa gaaaccttga 118980 
tgaagcctcc tgagtcactt tgcacttact 119040 
catacacgtt tcactgagtg atagttgggt 119100 
agatgtattg aacacctacc atgtacgagg 119160 
aaatgaaagc atcatgaacc atagtcttaa 119220 
ttcacatcat caagctaaaa agacaaggct 119280 
ttgtggccat gtaaggtctg taaatagaag 11934 0 
gtttcactga agagaaaatg gagaccctga 119400 
aggtgaaagg aaggagcctt aggctgggaa 119460 
catgggcagc caccctgctg tggacctcag 119520 
aaataaagct ctatgtaaaa tgaaggcatt 119580 
ttcgaatgat ctggtaaatc cacctttttt 119640 
tgcaaacaat attaaccagc caaggaactg 119700 
ataacaatca gtattaataa taattattag 119760 
agccccctga tcgttgtaaa ttagtatata 119820 
gcccaggttg gtatcaaact cctaggctca 119880 
ggagagatta caggtgtgtg ccaccacatc 119940 
agctcattat gttgccctgg ctagtctcaa 120000 
ggcctcccaa agtgctggga ttataggcat 120060 
ataaagaaag caattgcact tttggggaat 120120 
caatgtgaaa caccatttgc atatttttgt 120180 
cagtcagaat ttaaggtaga aaacacagca 120240 
acactatttt ctttttttat tatttatagc 120300 
tcatatctgt gacaaaaaaa tacgatttct 120360 
acaggcatag ttgaaagcca atatgattgg 120420 
acataattga cccaaattgg tagttttagc 120480 
ggggaaacag tattcagtat tagggtatgt 120540 
acggacatca ttgtataaca ggcaagagaa 120600 
aatcactgca gcattttgaa gagaacatta 120660 
ggacggtggc tcatgcctgt aatcccagca 120720 
aaattcagga gtttgagacc agcctgggca 1207 80 
aaaaaaaaaa aaaaaaaaaa aaatcagctg 12084 0 
actcaggagg ctgacatggg agaatcacct 120900 
gagatggcac cactgcactc cagccagggc 120960 
aacacagaaa agaaaatgaa attagcagga 121020 
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ttgttatatc tcaatgattg gtctcaaatg ttcatttact gtttgtagag gagaaatctg 121080 
aaacatgaaa gaaaaatatt tgaattttaa aaatctattt gcttttcaaa accctaaatc 12114 0 
aataatgact taaacttggt atcctaagga cagaaagaat tatttcagct tagttcttga 121200 
ttaacagtaa agaacaatta ttgaacaaga agtttatcat ttttggttaa gaataaagaa 121260 
ttatttaaat tgtcaaatag gatatattgt tatagccatg ttccatgttg tatatacatg 121320 
tcttcattaa aaacaaggaa ataggcacac caggtatgtg cataaaatta tcctcttttg 121380 
tcccaagtgg aacagacata tgaaaacagt ccccacctat cccctacaat tttttttcta 121440 
ttgttgatct tgagattttt ctatatttta tttaaatatt aatataatca tgtttaatat 121500 
ttttggtttt actttatcgt gtgtttgaag aggaaacatt ggatcataaa atgtgcattg 121560 
gcttacagta taagtgtagc tttcatacta tagaccattc tgcgttgagt gaagctaagt 121620 
ccccaagggc aaaggatctt ggtcaagtta atactgaaat aaaatgcctg ggccagtggt 121660 
tctttcactc cacagcacta gctgtatttt tataatagat tagcatgtag aatactgagg 121740 
cagggtttgg aggattactc taagaggatc ttttgggcca gtggttcttt cactccacag 121800 
cactagctgt atttttataa tagattagca tgcagaatac tgaggcaggg tttggaggat 121860 
tactctaaga ggatctttaa ggggccaggg aatgaaaggt aaaatccagg actgtgttag 121920 
gagagctgtg cctgtgcagg aattttctcc aagccctctc ccttctcctc cctcatgagg 121980 
tttctgaccc ttacactaga catgaagaaa ctcaccattc tgataattca tcatttgaga 122040 
ccgactttca tatctggaaa gtgtgcagtc ctgaattata aawgttttag tactgttatt 122100 
acctgttctt atcttgcaat ttgtttattt cactggtctg gtccaaaatc tgtttttcca 122160 
atttgtttgt cgagagggag tgttccaaga gctgaagttc aagtctcgtg gtctgattta 122220 
atacctaaat gtaacaaaat gaagttccta ttaattattt tttaattagt ttaactttct 122280 
aacttccttt tcattaaagt acccaagcta caggaaaaca taacaaaaac attatttatt 122340 
aacccaagta tcttattttg gcatattttt cattttcaga aaaggctcaa tgtcttagat 1224 00 
cacatctgag tgtgttaaac ctttttactc ttttccccac gtctctattt tttttttttt 122460 
tgagatggaa tctcgctcca ttgcccgggg tggagtgcag tggcatgatc tcggctcact 122520 
gcaacctccg cctcccgggt tcaagcaatt cttctgcctc attctcccca gtagctggga 122580 
ttacaggtgc gtaccaccat acccagctaa ttttttatat ttttggtaca gatggggttt 122640 
caccatgttg gccaggctgg tctggaactc ctgacctcaa gggattcacc tgcctcggcc 122700 
tcccaaagtg ctgggattta caggcattag ccactgcacc cggccgttat gtctctatct 122760 
tggaaagtgg ttagtagttc tggacaatgg ggtctgtgcc aaatactaaa tgttattttt 122820 
ctagtctgcc atattttatt tcatacaatg agacaagtag gagtagaaaa tggtcatatt 122880 
tcataggtcg aaagtatttt ccctttgccg aaaacaaaat gctattctca tatttatttg 12294 0 
tcactagaca gagagattgg aagtcacatg cttccattat ataaaaatat agataatttt 123000 
tagcctggga tttcctcatt tgtcaccact tgtttagact tttatttctt cttgccattt 123060 
ctccttcctg ttttaaaact tgtttgaacc aatcgaagcc gtatagcgtg agtgtgaagc 123120 
ggascctcag ccttgccgtg cgggcctttg tgagctactg cgtggcatga gcagtgcggc 123180 
tctcccgcgg attctctagc gcctggttgc ccttcagcag gaagaatcga ytactcactt 123240 
cctccatgtc atgcttattc aggatgtgat atcacaygca aatgtcagtc agcattgttg 123300 
ccaaggaacc ggggaccttg aaagaatcat tgtttgctgg tgtctttatg tcatttgcag 123360 
gagccttggc tggtccacag cgtgagtttc agggatggtc ttatccttag agctggttta 123420 
gttcttatca caaaaagtct tctgtgagaa taaagtcctt ggccaacrta aggttttgtt 123480 
tgggttttaa tattaacacc tggaatatag atttggccta cgtcttcttt gagtccaaac 123540 
attctatgtt ggttatttct aaaaggaact ggaaaattgt gtcctgttta attcataagg 123600 
gttataacat gagtaaaatc ccgtggggag gcagggaagg atggcacata agtcatgatt 123660 
ggcccagtag taattgtaac cattttcaca tcacttttct ggagagcatc aaaccgctgg 123720 
accagcctga aggcgtccat ctgcagggga ctgtaaatta cccaggccag gtaatgatct 123780 
ctcattccct ttaagatatg agacctccag ccacccattg ttgctcaatt tgatcgtctc 123840 
tcattctgac cggcttggag aatcttgctt ctaatcagaa attttcagat ttgaatttaa 123900 
gtctgtttca caaaatcagt aactgctcag caagtacctt caaacagagt gggtacataa 123 960 
ttcagtttct ttgcggcctt ccttaagctc agccattttt cttttttttt tttttttgag 124020 
acagagtctc actctgttgc ccaggctgca gtgcggtggc accatatctg ttcactacag 124080 
actctgcctc ccaggcttaa gcacttcttg taacctcact aagcctccca agtaactggg 124140 
tctacaagtg cacacaagca cgcctggtaa tttttttttt tttttttttt tggtagagat 124200 
ggggtttcac catgttgctc aagctggtct cggactcctg atctcaagcg atccacccac 124260 
ctcggcctct caaagtgcta ggattataga tctgagcaag cgtgctcagc tggctcagcc 124320 
attttcatgt gttcaattgg gcttcacatg gaaaaactgc ttactttcca tctgttttct 124380 
tattttcctg ttatcctgga taacatgata tctagtttca caataggcgt ttttttttta 124440 
aatcatatga cgcaacacaa gtacatcaaa tgctatgaag tctctgaccg ctataggatg 124 500 
tagcaaggtt tgcattgctg ctctgtccta acactttttc attactatta ttatttttta 124560 
tttttttaaa tttttgccaa gctcccatgc ttggatctaa ctattatttt aaaatataag 124620 
aaatgttata gtttaaaaat gcttatgaga cattttttgg atgagctatt caattaccca 124680 
tcagtgttag tatcaaaagg tggggcatgt gacttaatca ttactaattt attttaatag 124740 
gttggtgcaa ttttgccatt gaaagtaatg gtggccaggt acggtggctc acgcctgtga 124 800 
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tcccagcagt ttgggaggcc aaggcaggtg gatcacctga ggtcaggagt tcgagaccag 124860 
cctggccaac atggtgaaac cctgtctcta ctaaaaatac agaaatttag ccaggtgtgg 124920 
tagcctgcac ctgtaatccc agctacgcag gaggctaagg cacgagaatc gcttgaactc 124980 
gggaggtgga ggttgcagca agtcgagatc acaccattgc actccaacct gggcaatgca 125040 
gtgagactct gtctcaaaaa aaaaaaaaaa aagtaatggc aaaatctgca gttacttttg 125100 
gtccaaccta ataataattc gctttagata tatattgata tattgacttt taaatcttta 125160 
gtttttatga cttcctagga tttaaatttt tagtacctta tgatccatta tgtaaaatat 125220 
ttatgtatgt ttttcctgaa ctgttgtgat attgtggaaa gacctggtaa tcaagtaatt 125280 
tgttattcta ttctcttatc tgtaagtctt ttgttaatct atcatttcgc tactgttttc 125340 
tctgacctca tccaaccatt tttaggaaga caatgaaaga acagctgtgt ccttctagaa 125400 
tgagtcttac gagagtggca gggcttatgg catctcccct ctcatgtcct ctcctggctg 125460 
atgtctagca tttcttgatc cttttagctg aagtagcatt taggaataat atggagtggg 125520 
gattgtttca cttaaatctg ctcttttttt taaaagcatt ccttgtagcc cagagtagga 125580 
agccactgac ttcagaagca tgtaaagaag ccaggatgag gagtcagaaa gcgggcttgg 125640 
ccgccgagag tcacgaccac ggctttgagc ttggagcgtc tgcatttgta ctgctaatag 125700 
cagcttttcc ctttcccacc caggccgttc gctgggtcac atgttgtgca tcatttagca 125760 
tgtctctcgg tgaattttct tcttttgaaa ttttcctatt ttgctgttat tttactagtt 125820 
tctttctttc tttctttctt tttttttttt ttgagttgaa gtctcactct gttgcccagg 125880 
ctggagtgca gtggcacgat ctcaactcac tgcagcctct gcctcctggg ttgaagcaat 125940 
tctcctgcct cagcctccca agtagctggg attgcagatg cccgccacca cacctggcta 126000 
atttttttgt attttttagt agagacgggt tttcgccatg ttggccaggc tggtctcgaa 126060 
ctcctgacct caggtgatcc acccatctcg gcctcccaaa gtgctgggat tacaggcgtg 126120 
agctactgcg cctggccact agtttactat ttcagtcttc tttctgttat tattaatcac 126180 
tagctcatag aatctcacag tggaaagaga acttagcaat cacttgtctg gcccaaccct 12624 0 
ttatattatt tgaggcccag aaaaggtgag tgcctcattg tgatgcattt atttggttag 126300 
tggcagacct ggagccatgg cagcgctcag ggctcttgct cgggcgtgca ccatcttttc 126360 
tgtggctaga cgcttctcac tgtcccactt gtctccttct ccataatctc attccacagg 126420 
ctgtgttagc tgttgagatt caggtttcat cttaactcaa gagttagatt taaggccaga 126480 
gtttctagct ctttgcctca gtgcttttca tttctcaaat gttcaaagac tttaggactt 126540 
agaaatggaa aatgattccc ggagtccaga aagcaccagg gagacagagg gggtattcat 126600 
cttgcagtgg ttgggatgcg tggcatgaaa atgactcaca tgtcttcagt agatagaaca 126660 
catgaaattt aacctcagta ttaaaaacaa aaacagattt actgattttt aattcataag 126720 
cagccataca tccttaaytt cttatcaatt cattcctttt ctcctgtggt ggtgctttct 126780 
ttagtttctc atgccttcat tgaggaagct cctgacgcga ctgagtgcta gtctctagct 126840 
gcagggacac cgtgtgcttt atgtggcatt acttacttgg gcttccacat cagttaactt 126900 
ccgcgtttgc tccgctgttt ggttcaacag gtttgtccct atttctatca tcacagccgt 126960 
ctggttctgt actgcattct gctgtatctc taccatttct ttcttcatgt tgtcctggat 127020 
ataattctca agctagaaaa gaacagtgtt ggaaggcagt cattagtcaa atgaccggaa 127080 
acctgattcc taaatgtttg tcatctcctc cctatcttta aaaaaaaaaa aaaaaaaaaa 127140 
tctatcaaaa gacttgtacc ttgccttccc ttttggaatc ttactatttt tttttatcat 127200 
taggaaaata cagtgtgatt ttatttttat gcaaaatctg gcaacttagt cacatcatgt 127260 
aaaggaggga gacaagctac tggttgcttc tgtgttcttc tagaagtcca tgtcatggca 127320 
ggccacagag ggtggtgagg gcagccacag ggactgctgg gtgctgccac tgtggggttg 1273 80 
tgtctgtcct acccagctgc aactctgacc atgcagtcag gaaatgataa tttgacacaa 127440 
agaagcatca ctatttctct cacattctag acttttggtt tctccacata gacttgagaa 127500 
gacactctaa gacagcatat aaggagagga gcaccctttt gattttcctt ttaacctacg 127560 
gaatcaccac tcagttccac attctgtggg gtcttcccca ccttcctccg tattgagtta 127620 
attcgaccta ttaaattttt cctaacatgt atgcattttt cacaattttg tcatttcatg 127680 
tatcaagcaa acttttaatc gcaccttggt ccatttatca cctaacgtgc catgggctgg 127740 
ttcttctctc cctcagttac taaagatgat gatcatgccg actaatttta gcattaactg 127800 
aaacacaaga gaaggaagaa gctcatttca ctgccattgg tatagctatc cctgtctatg 127860 
gcagtaaaat tacatgatta tgtataactg caagacaact gagtacgtgg gaagagcctt 127920 
tgggcttgga gccagggaag cctgccctct gctttatagt cttggttcta ggaaagttgc 127980 
ttaacctttt gggaccctag tttcctcata tgtaaaatag ggtttctggt tggtcagagg 12 8040 
agtgtcttaa agaggggtta agctgtgctt ttaaagtcat tgtgtatgcg taactccaga 128100 
tacttagcgt ttagtttctt tttttttttt tttttttttt taaataatct aatgatggga 128160 
accattcttc cattccctgg tccaaagtat aagctcgtga gtgcacaaas catgttttct 128220 
tccttttcac atagtgtaac aaacattgtt tattacattg aataattgaa agatgattat 128280 
aaaactggtt ctggtgccct cctttaaaaa cttagaattc tttatagagr aamcattcgt 128340 
ggagtcagtc atcagacatg atttccccca aaatgttaac cactaaataa ttctgtgctt 128400 
tctgtcttta agagtaggaa aataggatgg gaagggtaga gtttctctct tagagcttct 128460 
ttgttgatgc atttcataga ttgtgtcttg tgactggtat cagatggttt taggattagg 128520 
ctggaactat aagtttcctg tttccgatgc cccctcgcca tcgactctgc cccacttctc 128580 
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taagctccca gctmcctgca tgcccctcag 
cgttctcccg tggatattgg atgggtcaga 
cctccatctg tggcctccac atctacgggt 
aaataacaat acaacaataa aaaacaaaaa 
acatggcatt gacattgtgt taggtattct 
agaggatgtg tataggttat atgcagatct 
gtggattttg gtattctcgg gaatcttgga 
tgtactagag ctccaagcat gtgtaaaatc 
cccgcctcag ccccccaaag tgctgggact 
ccttagtttt ccatatgaac caaaacagag 
aaaaaatttt taaaaattta aaaaaataaa 
aaccatctct attataggct tcataaaata 
tcagagttag ctcaggcttt ttagtgtgtg 
ctcccctgcc tttataaaac tgtaactttt 
ataaatagca gttcgtaatt ctcccctccg 
aggttgcagt taaagctgtg tgtcacccag 
cccagccaag ataatattta aaaagtttca 
ctccagctgg agtttagttt aagcccatac 
ataaaaagtc ttaacctcct tcttgatttc 
attttgtcac tctaaatcat acaaccagag 
ccccatttct agagctactg agtcagatgt 
tgtttataat tgaaaactta aagtcaaatt 
aagacataca ggcccaattt taaaaaataa 
acataaaaac aatagaatgc aagatccttt 
gactaagtat aggttcacag tgggtgagct 
aaaagcagat tacaaataca catgatgtgt 
tatatagtgt ctataaatat atacagctct 
ttctgtgtga tactgactgc attgctaatt 
gaaccaaaaa tgtgcagtag tagatatttt 
cgtgatggct catgcctgta atcccagcac 
ctctcaggag tttgagacca ggctgagtaa 
aaaaaaaaaa aaattagctt ggcctggtag 
gctgaggtgg gaggactgct tgagcccagg 
ccactgcact gcagcttggg cgacagagtg 
aaattaaaag taaaaatact tatgttctta 
agaaatatat gatgtgacag tcaggtactc 
agccccagaa acactagcga caggaacagc 
gccgkcactg cctgtgctgt atgggaatcg 
gcacaggtgg atgcagggct gagcactgga 
agaggccgct tcatactttt ccagcctttt 
gcttgagtat accttccaat tccaggcttc 
cacccctcag tggagggcca tttttaccac 
gccaattcta tcttctggga gcatcctgac 
gtagtaattt ttagagatac agaaagacta 
catttccaaa taaagatgtc ccatttaatg 
gataatgaga acaaacctag aaacaaagcc 
ttgaaatttc cctgtctctt gaccttgatg 
aatgtctttc tgtttagtgt ctctcttatt 
tagaaacctg cccagaaata tgaaattcca 
aaaaaaaaaa aaaaaccaac ccagtaaata 
acttgccttt ggagaatgtt tcccatccct 
cagttccaag atacttgaac ctcccgtggg 
catagcatgt gaaacatatt catgtttcgc 
agaggattat tccatgagcg tgatctgtag 
atgtgtgcga ggacagtgtg tgttcatttt 
aaagtgtggg gccttgggct gttcttcctt 
aacattgagg gccacagtga tccttctagc 
ttgctaaaag agttgcgccc cagacatagt 
agcaaccagg atgaaatatt ttaatgcaac 
atatatattt tagtgcaaaa tatgttctga 
tgatcaaatt tgacaggaaa aataggtcca 
acatttaaaa gtttatctgg ctatcttcct 
taccaaaaga aaagtaatct tgaaactggc 
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cctggtcact aaggctgcct ccctggcagt 128640 
tgagcaggat gcatgasagg cacagtcagc 128700 
tcaaccacac catcaatata tttwaaaaaa 128760 
ttgyaaaaca atacagtata gcaactattt 128820 
aagcaacctc gagatgattt agagtatacg 128880 
accctgtttt acggaagagg cttgagcacc 128940 
atcagtcccc cacagacatc aagggacagt 129000 
attttgttga aatgttactc aagccatcca 129060 
acgagcgtga gccagcacat ctggctgaag 129120 
tagaccacta ctttaaaaaa ttaaagtatt 129180 
aaatcagtca ctgatacccg gcaggccagc 129240 
tgaagagtct gaaatcttac taaccctttc 129300 
tgatctttct taattcattc tttctccttc 129360 
gtgattgaaa taaactattt aaaagaagcc 129420 
ctcatcgcca tgggagtaat ggaatttttg 129480 
aggcactgtc ttagttactc ctcacagcac 129540 
ttccgggagg cttggaacta tagagataga 129600 
tcagaaataa taatttacaa agtggtataa 129660 
agtacttaag agctaaataa aaattattgt 129720 
agggaaaatg aatcctctaa tactgccttc 129780 
gtttgcaact ctccagagat atgagaggat 129840 
ccaatttgaa attaaactta ggaactttga 129900 
aatttcttaa cctgccatat tgttttctaa 129960 
ttaaattgct actttttagc tattcaggat 130020 
aatgtgtgtc catttatgtt aatcttacat 130080 
gtatatacag ataggtatat agcatatatg 13014 0 
tgaagcatgt atcatttaaa taaaagaaaa 130200 
aattgaagtc tttgggagaa gaatggaaca 130260 
gtgttgattt aaaaagatat ttgagccagt 130320 
tttgggaggc cgaggcagga ggattgcttg 130380 
catggtgaaa cccatctcta caaaaaatac 130440 
tgcgagcctg tagtctcagg tactggggag 130500 
agagcaaggc tgcagtgagc catgatcgtg 130560 
agaccttgtc tcaaaaaaag aaaaaaatta 130620 
ctcttgaagt cattaaatta aggttttaag 130680 
tttaaaaaca aggaagaata ctgtatattt 130740 
cacagtaatg gtaggtactg tttcttggtt 130800 
ctgtgtcggg atcccaggcg cctcacatca 130860 
atgaccctca gcaaaatgtt agctcaaccc 130920 
aagagccaaa agtgatatat ctcaaaattg 130980 
acaatgcctt aagaaaacag acagaccacc 131040 
cagaaaagcc cagaattaaa gatgaccaat 131100 
aaaagaatct gtgttttctt ccaaagatta 131160 
tggatgtcca tcatatagta taaaaatgaa 131220 
tagcctttcc ataaatcacc acgtatcaag 131280 
atctggctca tccacttgga tagacagacc 13134 0 
aattagttat tttctagttt attgtcctag 131400 
tttactggct gtgactgaaa cccagaaata 131460 
ttctaagtat aaggaagtct tagtacaagg 131520 
agccatcctc cactggcagc accaaactcc 131580 
gtcatctgca ccgaactgct ctcatcaaaa 131640 
aggggacccg gctctttcca atttcacatg 131700 
aggaatgttt gccatcgcct tcatatctga 131760 
gcacacgtgt ctgaataggt cctgctgtat 131820 
gtcctcttct tgatggttga cacagtcggc 131880 
tctcagaact caagtgagtt atgcaagttt 131940 
tgcatggttt gctgcttagt gttatttgat 132000 
ctttaaaact tggcagcgca tcgaaactca 132060 
atatatatat atatatgttt acattaatat 132120 
agttttttat tactcccaca acgttttgaa 132180 
tttgtgaggc aactatggca gattgattac 132240 
tctcaccaag attgtcatca ttatttttta 132300 
tcagtaaagg aaaacataga taatatatga 132360 
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gtagcatgca actcttcata aatttgtcac 
tttgttctcc gaaaattcat gttgaagtcc 
aagtagggcc attgcagttg taatcagtta 
cctaatctaa tgtgacagat gtctttataa 
ggaacgcctc gtgacaacgc aggtagggac 
caaagatgag cagccactgc cagcagttag 
ggctttggct tccagaagga accaaccctg 
ccagaactgc gagagagtgg atttctgttt 
ccctagcaaa ctaatacagc ctaaaaaaaa 
aaatataacg ctaccttgca gcctccacca 
ttgttcaacc tcaggagggg gtgaaaaaag 
agcttcattc tgcagcagta aaagtgtttc 
caccctataa aagaaggtcc tctttcatga 
tgacctcagg gcctccagca cttaggcact 
gttagttgat aatggctcat tatcctcgct 
ccttattgga atccttccct ttccctttcc 
cttccccttc cccttcccct tccccgtccc 
ctggagtgca atggtgcgat ctcagttcac 
cttctgcctc agccttctga gtagctggga 
tttttgtatt tttttttttt tttttttttt 
atgggttttc accatattgg tcagggtggt 
gcaggtgagc cacccgcctc ggcctcctaa 
agtgcctggc ctactgtctt ctctaaaatg 
ctcagataaa agcaatggcg cctcctttga 
tgcggaattc cttctccctg ctgcctgctg 
aattcctgcc actggaatta cgctctggac 
cactgggttc ctgctgcaca ggaggccagg 
aagggaatct cgttaatcca ggtggccagc 
agacagggcc tccctgcgtg gggcttctgt 
tcatggggcc tttcccttcc cgtcaccacg 
agccactaga tgtataggtc agcagctcca 
ctgatccaga atagatcgtc ctggggtaaa 
ctgtctagta aacacactgg aacttccata 
ttatatgcac gtattctgcc attccttttc 
tactgtaagc caaagggctt gcatttgaat 
aaaggcagaa taattttata tgccacaaag 
ctaagcacta cactgtatta ttctaatcct 
gtgctacttg tacaagagat acaaattaag 
acctttagta attcttctta atctccctac 
cagaatctat gtcttcctcc gcctccggag 
tggagtgtct gtgggggtag gtcctctttg 
tcctgcatct ctgaatttga agcgaggagt 
cccctgctca gtagatcaga cgtgttctct 
gcattgtttc ccagacactc gactgtcccg 
ataagctcca gccactgtag tggctcatgg 
aactggtatt tctagtaaag cactcagcca 
attctaaaca gcattttcgt tggcaaaaga 
aaaactttgg gaaacccctt tcctgaatgt 
ttcagaacag aagaactaat atccatgttt 
atgtaagtac atttagtgat taaaagggaa 
aaatctcact atgtaattgt tttttcctct 
aaatgctagt attttggaga aaatagaaga 
ctgcgtggaa gatgtgtatt ttggataggt 
acaggctcat tacaggtctg agcaaatgtg 
gtaaaggcag gtggcacctg gtatggctca 
ctcggctcct ttggaagagg ttcaacgttg 
attcctggag ctactacttc ccagggcaga 
aaatcctagc attgggccag ccatccagaa 
atttcaaagg gaatagaata ctgaggccct 
aactgcacca gactgagctg tgtccagagg 
tcacccaaca cctgacacct ccatcttggc 
atctgttaag tgagagtaac tggcaggtta 
ccaggtatgg tgtctgatgc gtgagttcgc 
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ttctttgaag gtccttgtta tgagttgaat 170220 
taaatgccct cagcacgtga ccgtcttcgg 170280 
agatagggtc atactggagt ggagtgggcc 170340 
gaggacggtc atgtgaagac agatacagga 1704 00 
agggtgaagc ttctacaaaa cagggaacac 170460 
cagagaggcg tgggacagat cctgcctcgt 170520 
ccccacacct tcacctcaga tttctgctct 170580 
aagcaagttt gtggtacttt gttacaacaa 170640 
aaaaaaaaaa aagtaatagg aaaggaatta 170700 
aacactgttg ccatttggtt cttctccttc 170760 
tccaggcagc tcctggtgat agctatgcaa 170820 
ctagaagtac taaggctcgt taattgcagc 170880 
agagcctgtt tctctgcagg aagatggggc 170940 
tatccatatg tctgtaacca ttgttgtgag 171000 
aaaatgaact cgttgaagta tgaggccagg 171060 
cttcccgttt ccttttccct ttcccttccc 171120 
tttagatgta gtctccctct gtcccccagg 171180 
tgcaatctcc acctcccggg tcaagcgatt 171240 
ttacaggtgc ccgccaccat gctctgctaa 171300 
tttttttttt tttttttttt tttagtagag 171360 
ctcgaactcc tgacctcagg tgatccgtcc 171420 
agtgctggga gaggcacagg cgtcagccac 171480 
gcatctgtgc attcatctca gccgcccctg 171540 
aatctgagag acgcagggcc ctgcccattc 171600 
tgaggaggcc ccctttgcca cggaacctga 171660 
aagcggcaag atactccttt cagtcccagc 171720 
gtgctgtgaa cctgctctca gccccgggca 171780 
gcctcttcct cagagcatct gcagtgctgc 171840 
cctccacact gtggtgctgc tgggatgttt 171900 
tgtgctccag aacccggtgc atttggatga 171960 
catagaatcg aattatcaaa tgcacactac 172020 
cacattcaca tattctgaat gtacaaatgg 172080 
attattgtcc ttccagataa tttttcaaga 172140 
aagacaactt tagaacttcc tttggacagc 172200 
atcttgcatg aagctaaatc tttgttcatg 172260 
ctgcagtagt gtgttaggtt tagtagatgg 172320 
attttcacaa tttaacaaat gtgagacacc 172 380 
gaatcttcaa tgaccttgta gcctagaaag 172440 
agagctaagt gatccagagc tgaattaatc 172500 
tagctctaga aaggtcaaac ccttccgaga 172560 
ctgtgtgcga tcctgtgaga cagcgggatg 172620 
ttttctgcta tgtttgggga gagcctcact 172680 
tctttcacca cagctacaaa caacacactg 172740 
atgggcattt ggacatggtc tatgagagga 172 800 
gagagggaaa tgggtagaaa ttctttccca 172 860 
gagcctgcag ctgttcacta ttccatatca 172 920 
aaagtgagaa aacaacaaag cttgaagccy 172980 
gtttacttag ggcttaaaaa tatgcctgtt 173040 
tctatgccga tttttcagag tacattttaa 173100 
aaatacttga tcgttttcta aacataacca 173160 
atttaagagc agaatatttc attgctacca 173220 
actagaataa gtagtcagca atacaaaacc 173280 
gtcaacatgt ccaagctctc agtgacaaac 173340 
ccacttctca ggaagacaag gcagatcaat 173400 
gactcgcacg tggttctcca cagagctgct 173460 
ggagcacagg ttgcttctct ggcccatgtt 173520 
gttcgtgttt ttcgttcata aatggcctgg 173580 
cagtggagct gcatgatctg gtctggggat 173640 
gtgggatgga ggctgcttcc cgatattgag 173700 
aagggagaac gtctttcatt cacttaaaac 173760 
atcatccacc tgtagcctct agccctcttc 173820 
tttggagagt gaagtgacat cggcagagtt 173880 
cccctttccc ggtccccttc tcctccattt 173940 
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aaactatccc caacttggag attctgatgt 
agagatcctt tctatttaaa taaaattcaa 
tgtccatctg agtgacctaa ggtggacaaa 
gaattttctg gtaaattaac aaataatttg 
catttcaatt tcgaaatagg attttgcaac 
atcccaatga tccatccatc ttcccaccca 
tttttctagt aaattcacaa ataatttgag 
cttctattta gaaataagat attgcaacca 
taccccccat gatcccatct tcccgcccag 
cacgcatccc ccagctgctc tcctttcatc 
agctggtgac cgcagctcac tttcttccct 
tccttctttt taagacacac accgcctcct 
acctgggcgc gcactgtcct tggctgtccc 
ctttgtatct cctccttgag tcttctcttc 
gccctmagca aagatactcg ttttgtgttt 
tgtcatttca ttttaatctc cacagacaat 
gtgaacagaa tcctcaaact ctgcaaccat 
aaacatccac tcttagaatt agtttgaaaa 
attctatttg ggaggctttt gacctaatgt 
aacatcattt gaggtctcca gacagaaaag 
ttagagttgg gcttgtgtgt gtgtgtgtgt 
cagctgtgta ctcagcagta cttcatggca 
agatgcggat tttgggcagc actttgtcct 
ggcacagtac ccacagtgag aggtgatgtt 
tctccatata ttgatgccag atttgaattt 
acttgccatc tgttgactgt ttttatagtc 
cctagaagtt gaggcaaaag ctaaaggccg 
ttcctagtgg gtattgtgac ttctcttagg 
catggacgcc tgcccacata gggtctttta 
tagccttttt gcttttttct agtcatactt 
cgctctgtct ctcaggctgg agtgcagcgg 
ccctggttca aacgattctc cccgcctcag 
accaccatgc ctgggaaatt tttgtatttt 
gctggtctga aactcctgac atcaggccat 
attacaggtg tgagccactg tgcctggcca 
tgtatctaga taaccaaccc ctttcctact 
tttggccact atataactcc aagatgtatt 
gcattctaac attaatagtc catgcctctc 
ttgagtgtac cagaatgtct gtgctctggc 
gcagctgcat attctttgca ttaatttttt 
atatattcat atcattttac ctctttgtgt 
aaggaaatgt gcttttccct tcaaaatgtt 
ttctcatcaa tagcaggcat tttaaatata 
tcatgtaaat ttctttgggc atttcatatg 
tttagaggtc ttgtagggca catgtatatt 
ggcacgttat gcatacctga cacttgcaca 
gcaacagacg ttgtcaggcc acgtctgcat 
cagcagctta caatgacaaa atgcttctca 
agattctcac actgccttag cttgggtttc 
catgcaggaa gtttatttag gcagtggtcc 
aggcaggggc gccgcagggt ggttcrcaca 
ctggtaagaa gccccacagg atctcccaag 
cctggggcaa gaatggggac tgctgtcccc 
tgacccaggt tttggagctg tgcttgcgag 
aggcagcttg gagccaacgt ccctaggcat 
aaggcctgtc tctgagatgt cctgaagagc 
gcacaagagg gtgaattctg agcagcacca 
ttgaggcccc tttaatgaag gagaaaaatg 
gtctcatggg ccccaggctg tgggcagtgg 
cacgtggaag gggagcttgt atttatagcc 
aggaaaagca gtgctctgag aaagacaata 
gagacaggaa accaaagcca gggttgtcat 
gcgaatgcgg agacgctcgg caggtccttg 
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tgatttctca ccaactgtag atgctggttg 132420 
ggtccttaga cctttttact aattagtttt 132480 
aactaattaa tcagagggtc taaagacctt 132540 
agatttcctt ggaaactttt tactgttgcc 132600 
catctctcac acacacatac acgtttttct 132660 
gcccttccat ttttctagta aacccttgaa 132720 
atttccttgg aaacatttta ctgttgccca 132780 
tctgtcttac acacacacat acacgttttc 132840 
ggtccactgt cctctctgtc tcttgctggc 132900 
ccgctgtcag agtcaggaat ccacatgcaa 132960 
tgcaggtttg cctgaggata aggtccagat 133020 
gactggcgcc cctgatcttg tgggcctcag 133080 
agctgcccag cggcctcata ccacggccgc 133140 
ctcctcaggt cccaccatcc cccattgcat 133200 
ccttttgata tcaaaaccat tttgtatttg 133260 
aggttaatgt tcttgcttgc ttggtgaaga 133320 
tctacatata caccctagta acaacaagca 133380 
cttgagtgta agattattaa atccagggat 133440 
tcttggttcc ctgtcatgag gaaactctga 133500 
tggcaaaact gggctctcct ccccctcctt 133560 
gtgtttattc tggagatttt gctgcctaag 133620 
gaggctgagc ctaaagaggg aagggctggg 133680 
cctaaacccc tcgccagagc ctggggggta 133740 
cacatgccct gtgacgtggg aagcaagttt 133800 
ctagaaccta gaaaagccca tgccaaagct 133860 
ttggcctttt cttcacgttc agtgtaaggc 133920 
agggagggaa gcctggcctc tggtgccaat 133980 
gagcacactt gccttcacct gccctgacca 134040 
agcacttcct gaaatggatc tgttctgatc 134100 
ttttattgtc ttttttttga gatggagtct 134160 
cgtgatcttg gctcactgca acctctgcct 134220 
cttcccaagt agctggggtt acaggcgcac 134280 
taatagagat gggttcgcca tgttggtcag 134340 
ctgcctgcct tggcctccca aagtgctagg 134400 
tttaataatt tatgagtgac tatctgatac 134460 
ttcgctagta taagagactg aaagttcact 134520 
aggaaataag tttgtgggcc tcagctggtg 134580 
ctcctgtgga taggtacacc ctacagtaat 134640 
aaatcctatc cgctttgctc ttctttgagt 134700 
tcacatatat ttgaatatat gtttttccac 134760 
gtttccctta ccactactcc aaaatttgat 134820 
ccatttattt tctactgata aagtggctat 134880 
tgtaagttta aggagactgc tgtagtaacc 13494 0 
caaaaggtgt cacattttac acgagtgtct 135000 
taccagatgt ctgtgagcgt gcagcctcat 135060 
gattcctgga agatgaggag caaatacagt 135120 
atatagatat atacacagca agaatagtta 135180 
gtgtgtatgt gtgtgtacct ctgtctcacc 135240 
cccaaaagca gagcctgaga caaaggcagg 135300 
cagagcgcag ccatgccgaa caggcgcggg 135360 
cgtggactca ggcggccacc gccgcgctgg 135420 
gagccctggg acagtgtctc agaacatcca 135480 
agggggcagg tggagcctag tgggcattca 135540 
agtgccgagg aggctctcat gggtgtcccg 135600 
ggcctggggg tttgtgggaa ggcctgaggc 135660 
aagttgggcc cagagggtta attccgagca 135720 
gagggtttcc ctgacacagc aggggatgct 135780 
aggcttagag aaagtcagtg cccaccccaa 135840 
ctaaagacag gctagtgggt aactcggggc 135900 
cccagtcagc agcgctggag aggagaggag 135960 
tttctagtag attggggcag ggcaggcctg 136020 
gcaggagtga gatgaggttg cagcagcaga 136080 
gtggcctctg agttattctg cagacttctg 136140 
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ccattcgtct attttttggg atactttgtt 
ttttccctac agagatagaa gaaaatataa 
gtagggccgg gcatagtggc tcactcctgt 
aggatcactt gaggtcagca gttcaagacc 
tacagaaaat ttttttaaaa attagctgga 
ctcaggaggc taaggtggga ggattgcttg 
tgatcgtgcc actgcactcc agcctgggtg 
aaaaaaaagt actgaatgat aaatgattac 
aatgagattg atgcatttag gctacaatat 
agcgttcttg cctttctgaa tttaaggcac 
aaatgtctca gtgtttttct atgccaaata 
tatgcataca tttgtctaca catgtgaggt 
gtgtgtgtgt atgtatgtgt gtgtgtttca 
gtgtgtgtgt gtttcagttc tctaagaaat 
ggaggcttta gaccctgtaa tattgtcgga 
ggtcacagtg ctagctggta gcagagtact 
ccagcactct catcgctgta ctgatatgcc 
tctgaggagc attaacattc ttactttttc 
ggccttttac gtctcctctc tgccctgttt 
gccaaaatac tgggtggagc tctgtggagg 
tttttttctt tttccaataa actgaattta 
gaaaagggaa aaggctagcg aaacttagat 
gacattttca aggcaaacta gtttttgctg 
tgtttctgtt ttaaaaaggg agaggagcgg 
aatctcacag cagggctgtg cgtgccctgc 
cgcaactcca cccccagcca agactttctg 
aaaaagttga aagaattcca atggatagaa 
atacacagga ttaatttaca cgaagactca 
agatgcacta ctgacccgcc gtccgcaaat 
tgatggctct ccaacgaact gctcctcgtc 
tcctgggtta gagtaaatgg atgcaaacac 
tcctgtgagc aaaacgcact gacgggcaat 
catctgcgga ataaacccgc ccaaaccata 
attttcagaa aagaaatgtg ttctttcctc 
gcagcttttt ttaatgtaat gatttcataa 
taggaagata atctacccct tgtattggat 
agttcccatt ctcccaagta tttaaagctg 
cataatttaa accccctgtg tgcatggact 
cagaagctga aagggcgcaa tctttttata 
gggctatgtt ttgcacctta atctttaaag 
ttttagatct gatcagcagt agaatgtttt 
caatattcaa tacctagatg atgtggcaag 
aatataactt ttatcttcat ctcttgatct 
atcaacatat ttaccatttg ccatttcaaa 
ttatggaaga ccatagaaaa ccccataaac 
cagcaaaaat gttctaaaag cacatgcact 
ctggaaagag agactgtgaa ctgcacatgg 
tgtaattatg tcttccaaga cctcctaacc 
tatgttacgt caaactgttt tttacaaaat 
tcggagagag tttgtcgaag tttttttcag 
gggttgagta agaaccgaca tgcacaactt 
tatttctata ggattcctca ttctaaagta 
aattcattaa gataaagaca cagatacagc 
ctcccttacc tcctgacttg aatctataac 
cgaaatatrt tttgaggcca attatgtcat 
ttggtcagta gtctcactag ttttgttctc 
gttgctccat ggagactggt atgatgagct 
cataatagat cgctgtcaaa tgaatgccat 
caccctagga gtgcttacta gatgatccat 
actaaaggtg atagaagtca gaataatgtc 
ctgagcagtg gcacgaggga gccttctgga 
cggtgctggt tacatgagtg aatccatatt 
ttagtatact ttatccattt cctgtgtttt 
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aaattctcag cttagaagat agtagtgatg 136200 
aactattttc tttttaaaac tgtactgaat 136260 
aatcccaata ctttaggggg ccaaggtagg 136320 
agcctaggca acatggcgag actccatctc 136380 
catggtggct catgcctgtg gtcctagcta 136440 
agcccaggag gttgaggctg cagtgagccg 136500 
acagaatgag actcaatctc aaaaaaaaaa 136560 
aaatagaagc agttaaaatt tagctctagg 136620 
accaggaact tcctttttaa atgaaactag 136680 
actgaaagaa aaaataataa taatgtaaca 136740 
gaatcttatg tatatctgtc tagagacata 136800 
aggggtgtgt gtgtgtgtgt gtgtatctgt 136860 
gttctctaag aaacagacat tccaaaactt 136920 
agacattcca aagcttggtg ggcaatggcg 136980 
gtgtcactgt aagagggacg ctagcgcctg 137040 
aacttaaacc ctggtctccc aacaccccat 137100 
tattcttatc ttaaaaaaaa aaaagtgctg 137160 
atttttgaaa tgaagtataa agatactgat 137220 
ttgctgtctc tttctgtgtt acatgggttt 137280 
aggtagcatg atcatctctg aagtgggcag 13 7340 
cttggtcaca atgactatcc taaatggcta 137400 
gattttctaa atttagataa ttttctagaa 137460 
tcctttataa ggccggcagg aagcgtgtgt 137520 
acttgggaat gctgatggga atgcttgaga 137580 
cgggtcccac tgcctctgga cagaaacccc 137640 
cttctttatc tcctctttct gctagcaccc 137700 
tttttgagat aatattggaa gatgctcaaa 137760 
gcgggaacac aagccatctt ctgtacatga 137820 
gtgtttgtac agttactttc tcagtatggg 137880 
tcctgcctgg acacccttct ctctgtgctt 137940 
acatttccgt gctctgcagc aacttgagac 138000 
gtgcgtgggt cgtggggagc atccagctcc 138060 
gggaaaagcg ctgtcgtata aggccagggg 138120 
tttgattttt gtgttcataa agctgtaggt 138180 
ccgctgaagt tcgtgctttt ctgaactatt 138240 
gagatgatct gtcccttcga cctctggttc 138300 
cgagtttttt catattttca tatttattta 138360 
ttaaggagct gtacatctgc ctgggctttg 138420 
actcacatta gaaacacaga ttatttaacg 138480 
ttgcaatata ttttaagcat tttaaccttg 138540 
cagataagaa acaatggagc aaaagcaaaa 138600 
acagagaata gtataacttt ttgttttcca 138660 
gaaatttggt aggaagtgta acaagtacga 138720 
tgttgatagt gaagctggga cctctgttta 138780 
acgttctact tctgtctgtg gccagcagtc 138840 
gtgttccgtg atgattatag tttgactgtg 138900 
tgattatgac tttgggcaaa tcactgaact 138960 
caaaataaga gagtatttta ctacaaaata 139020 
accagctcta gggatgtttc caagtcattt 139080 
99tgtgtcat tcatgtattg gagggggaga 139140 
ggccatgaaa tgaagcgcaa gcacatattt 139200 
atttttacag aaaatggcac tctaagaagg 139260 
atttagagtt acactttgcc ataaaagagc 139320 
atctgctgaa ctgtcgacat caggaagact 139380 
ttcagattga acctgctaac atcagattct 139440 
acaatggaat tattattttg atttttaaat 139500 
catgctctgc agttccattt taacaaataa 139560 
cacagacatc atgttgggtc acagaagcca 139620 
ttccatgaag ctcaagaatg gaccgaactt 139680 
ttccgggtgg gaggggtttg aggctgtgaa 139740 
gtgctgaaaa tgtttcctag atcttgacct 139800 
tttaaaagtt atcaagctgt aaatttcagg 139860 
atatctcaaa aattctttta aaaactaaaa 139920 
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gacatttaga 

gggtggtttt 
atgaactttg 
ccagccatct 
tttccggatc 
tcccattagt 
gttggatttc 
tacccattgc 
ttagttctgc 
aaactgaata 
acattggaag 
cacccggtgt 
cacatttctg 
ttaagtggaa 
agcaaacaaa 
aatgactgtc 
cagcttgtac 
acaggaaaat 
ttgagaagcc 
ttaagaaaga 
tcaaatagct 
atacaagcaa 
catgagctga 
ctcaacgtac 
tatgatttga 
gaccttcatt 
agccatgttc 
taaccacggt 
ccgtggttgc 
actcatgtgg 
tctaaggtga 
ccacagcaat 
cagagacttg 
taaccctctt 
cctttcactc 
tgcatgcttg 
ctctgtttat 
actcacttgc 
tgacggtgtg 
cacacaagtc 
atgccggtgg 
tcctattgtt 
ttttttaacc 
agtttaattt 
tttcacattt 
tcccccctag 
agtagttggt 
gactttagaa 
accctggcat 
ccagctgttg 
aaatttcccg 
ttatccccag 
ttgcagctta 
ccaaacttgc 
taatgatggc 
aggaagaaaa 
acaagcatct 
gatttgaaag 
ctcttttcca 
tacctccttg 
gaatcaatct 
cagcgtattt 
aaacagcttt 



aatgaaatgt 
atttgtctct 
gaaaagtaat 
gatgagggtt 
aagcataaaa 
ctaaaaataa 
agtggcttca 
ccattggaat 
tttgagctac 
aaataacttt 
ccattgtcat 
gagtcccagt 
gttggtaaaa 
tgagttagca 
ttggagtaag 
agccactgga 
ctctgtaaag 
tagcatagag 
aggtgtgcct 
aaataaagga 
actaaattgt 
ctgttggcct 
gaaagtttga 
cctcttcctc 
tatgtttatc 
aagatgtacc 
cttgcaagct 
gtacctcgga 
cttcgggcta 
cctatttctg 
actagtttta 
tcgggaagta 
aatcattttc 
gccttgcata 
cactgccata 
aatttttact 
caaaggggcg 
ctcatcagta 
tgacacacag 
actagagatt 
gaatgcgggg 
aggtttgttt 
aaaactcact 
tatcagtggc 
aagtggtcca 
tggaagaacc 
tccactggaa 
ggccaaagtt 
tttcaggtgg 
gtcccctgcc 
aaaatggcag 
tggtcgctag 
cttgcatttc 
aaggcaggct 
cgtaagtcat 
gtcctcgtgc 
ctctctgatg 
accagagtga 
ttggctgttg 
ttaacctctg 
attcctcaaa 
tctgaagagt 
cagtcagtcc 



ttgacaagtt 
gcagttttca 
tttaaagtaa 
aaaatgtata 
ggcatttgct 
ataataaata 
agtataggaa 
aaatacaaga 
tgcaatgcaa 
gttcatattg 
tacagattca 
tcagttcctt 
caggtatata 
caattcccaa 
ggagacagtc 
agaattcata 
ctaggttttt 
cgtaaaatgc 
aactggcaaa 
tctttttttt 
cagaacataa 
agagataaat 
gaacaactag 
cccttccccc 
ggaaagcgag 
cttccgtgtt 
ggctggttag 
agaagcgttc 
atgcgttttt 
cttgtaattt 
taccattaga 
aactttcagt 
ccataattag 
tgtgtctctt 
agttataaaa 
caacaactgg 
cagagtcaca 
gaggggcaca 
cacagcagca 
cccatctcca 
ccaggggagc 
gtttgtttca 
gtggttactt 
agtgaattct 
agtgccagct 
tacattggag 
tcttggcctt 
agaacctacg 
tgtgtgtaga 
acttagatgc 
tgataactca 
ctctcctctg 
agtggtccca 
gatgagccaa 
agaaggtgaa 
tcacagaagc 
atgctggaaa 
gcagcagata 
ataaagtttc 
gttataacct 
agttggttct 
ggatttcatt 
aactccacag 
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ttgttgtgac 
tctgggagca 
ctatcttaga 
ttcgtaatct 
cttggaagac 
aataaatgtc 
gaaaatgatt 
ttactctgag 
gtgtttctga 
aattgagttg 
tcattacagt 
aggaactact 
aatacctact 
actgtgtgcc 
tgaatattca 
gctcaaagca 
ggtggttgct 
aggtggcagt 
tatattccat 
aaaactcaat 
atacttaaat 
agagaaaaaa 
cctagaacct 
aatccaggta 
taagtcaaaa 
ttcctaactt 
tcctgtcttc 
tcagagcaac 
ggaagtgtag 
atagtttgcc 
ttatacagaa 
gtgattctcc 
tttcagttct 
gcaatggaag 
agcatgccat 
aaggggaaga 
gtagcacttt 
tgttcctatc 
ccatttctca 
ctgactcact 
accggaagaa 
gtgagtcatc 
ctctagtttg 
tgaacgcttc 
taaactttgt 
ttttggctgc 
ctcctggagt 
tgaaggcttt 
cctgacagga 
ctttcatggg 
gaattaggat 
cccagctcga 
gcaagcactc 
ggtacagaaa 
gcttaatgca 
aagctcccat 
aaaagaaggt 
ggcccttggg 
atcacttttt 
ggggggaatt 
gggcatcacg 
agaaggcagg 
atttcagaga 



actatgaccc 
cctaaatcat 
gaactgtgga 
gacattccaa 
caagaaagaa 
tgagtcatgt 
tgtgctatta 
aaaagtgaaa 
cttttgagac 
gggaagtagc 
acagatttga 
ggctcaatac 
gtgctatgct 
aaggcaccct 
agggcaaccc 
gctcaccgtt 
gtttgtgaaa 
gtccaatttg 
tcatacgtca 
ttatatgtat 
tatttggagc 
tagtgaatca 
tgcacttggt 
ttgcctttaa 
agaactaata 
ctgaaatcac 
tttcaggtga 
atgcacgtgt 
attggtgcca 
tcctcaccat 
aagccaaatt 
aaatgcttgt 
ggaggcccgc 
ctagtggaaa 
tcagaactga 
agttatttcc 
ggaccaccgt 
aaacagtgag 
gagctgcagc 
cggtgggaac 
gggagccgtg 
ttacccccat 
ggttatgact 
cctcagttgt 
ggtttagtgt 
tctttgggat 
ggcttgaggc 
gtccaaaatg 
ctctactcgt 
caagtccatg 
tttgatcctc 
ggaaaatgct 
aagaggaaag 
gtacacctga 
atttagaaag 
tggcaaaatt 
gctatcaggg 
tccttccttt 
agctgcctgt 
atgtttaacc 
gaataaacac 
ttagtcttgg 
tagcagtaca 



tagttactat 
gcctaatgaa 
ttaaaccatt 
aacacgattc 
ttcatgtggt 
attggatttt 
ataatagttc 
tcgattgaat 
atagtataaa 
gatatttgtg 
gaatcaaaca 
tttctaacat 
agtgtgaaaa 
gggtccttgc 
agcagtgttc 
tcaacagtat 
agcaagtgct 
attccaaagt 
ctggggttac 
attggtattt 
taaccgctta 
ctaagggtcc 
aggatataaa 
ttgtaatctc 
aattgtgtaa 
taggaaaaac 
acagcattta 
tccgtgtgta 
gttttacaaa 
cctcacttgc 
tacttgcatg 
ggtaaaagta 
cccctctctt 
tttcctctcc 
cgttttcttc 
agagatgttt 
agaatggctg 
tgcctttgag 
aacactggtt 
aaaagctccc 
gcagaggttt 
tttttttttt 
cctacgagcc 
agaaatttag 
ctttactgaa 
tcaaattatg 
ggcacacttt 
cctttcctgc 
gcgtcacttc 
ctagcctagg 
atctcaacca 
gggtttttca 
atcagaccag 
tggatttttg 
atttgaaaaa 
attgtgtata 
agggagggaa 
tattcccacc 
tgtatttcac 
agtacttaat 
caagaccact 
aactcagtca 
agaaaaatga 
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tacatgggtt tgtacagttg agtgttgaag tgttgacttc cataaaagaa cacaaatgtc 143760 
aatctacagc agtaccatgg gtacgtaaat tgtgttatac agcgaaacac tgcacggtga 143820 
tggaaaagaa gaaactttaa catgcacaac aacatggatc catcacagaa ataataaaag 143880 
agaccaaaaa ggagtatgta ctgattgttc caattacatg acattcaaaa cctagcaaaa 143940 
ttaacccatg gtgggtgaca gagcagagtc aggattggcg ttgagaagag ggggatattg 144000 
acgaggaggg gcctgaggaa gccacgtgca gggtgggaag ccttctgtat cttgagctgg 144060 
gcagtcatta cacaggtgcg cacatatatg gatgaggagg ggcgtgaggg agccgcgtgc 144120 
agggcgggaa gccttctgca tcttgagctg ggcggtcgtt acacaggtgc gcacagatac 144180 
agacaaagag gggcgtgagg gagctgcctg caggggaaga agccttccgt atcgagctgg 144240 
gcggtcatta cacaggtgcg cacagatatg gatgaggagg ggcgtgaggg agccacgtgc 144300 
agggcagcaa gtcctctata tcttgagctg ggtggtcatt gcacaggtgc gcacagatac 144360 
aaaaatttac caagttgtac actcaagatt tgtgaatttt attctgtgta agttatatct 144420 
aaaaaaagaa aagaaaagaa aaagctagat tccctaaaac agagacagca gggctcgagt 1444 80 
ctgagctagc tatagccatg ccagcagcta gatccatgaa aaggttgggg ttggctttgc 14454 0 
ccaggtgatc attcggggac gggggacgtg ctgtgaatgg aagatgtgcc tgctgtcagc 144600 
actgatgttg cccacccttt atttctacaa cgctgtcttc aaaagaatta catttcaatt 144660 
ttataccaac tatcgtgcct cctcatgaat cccttccccg cacaacctgg aaaccctcgc 144720 
ctggcgtcgg ctccatctcc agatgttact cactggctac cgctaggtgg ctgcgaaggg 144780 
tggcggcgtc actgatgcgc actcaggcag cagccatggg gaggttgaat ccccggggca 14484 0 
tctgcctctc cctatgtgtg tgggtcctgg gagtgaggca gtgtggcgtg gggctgttgc 144900 
acacaccccc gactgtaggg ctgcacccag acacgtgcgg tgaccccgtc tctacagccg 144960 
cttgttgccc tggcaccaag ccaaccactc agcatccagc gcgtcctcac cctccctccg 145020 
gggtgaagcg gaaacaaggg tatgtgccaa aactggcctg ctcaccattt cccagatttt 14 5080 
ccacatttgt tcccactcgg ggtgaggggt gtgcttctgg tgtgacagct gtgggctgtg 14 514 0 
tagggtggcg ggcgttggtg gtgaagtctg tcggccctcc tgacccacac acgagggggt 145200 
gtggatttta tattgaaatc tttttaaaat ctgttttttt gtaagaggct ctgaaaggaa 14 5260 
gaaattttat cagagttttg cggcctgtgt acgttctgat acctctcaga gctggagttt 14 5320 
cttacccata taggacaagc tgttgtgaaa ttgagtgaga cgatgtaagc acatggcgtg 145380 
cacctgataa atgccagctg ccaccacagt gatggtcagc agcgtggtca ccactgtcgt 14544 0 
ttcacaatta cagcccaagg agcccaaggg gaaggagtgc ctctctctgt tttgaccttc 14 5500 
tctgactgct gtcctaataa acagtgtcct ttctacaaga accctgtaga cttttgaaac 145560 
caacaagtga aggcactcca aggcccttgt tttgagaagg ggtaagtgtg ctaggtaagg 145620 
gatttccttg ggtgcttacc ttccacggct cctgggcccc tgactcgaag ctgaccatct 145680 
gtgctgatgc tgacttagga ttttaaatca cttaaatttg agctggatag agaaagggtc 14574 0 
ctagttaagc tgagagggct gcttattcgt gatttttttt ttcttctttc tcatgcagag 145800 
actgtttatt ttagtggtag cggtatttag gggtgaagaa ggggaaagga agaatagtgt 145860 
gccatcaatt aattctatgc atgtcagctg caacgccttc atggcacggg acaggccaat 14592 0 
tatgtaactg taaacaaatt atatgtatta aaagttgtcc aattaaagga aaaaacatgc 14 5980 
atggatttat gtgtttgtta ttacccagaa gggagccatg ctgtacttga aaatatgcaa 14 604 0 
aatttcacat cacaaaatca ccagttgttg tttgaggggc tggtgttctg attagtctta 14 6100 
atttttttta actcataaca tttttgtccc agtcatcaac actgttaaga acatgtcact 14 6160 
ggtgcagtta agttaaaaat gattcaggtc aggaattcct gtcattaaca attttttata 146220 
ttaaagttgg aaaagtttaa ggaaatttaa gaacctattc cttaatagtt aaaaatagta 146280 
aggaatttca tataccccca aatattaagc ataggtaatt agcttgtggt tgggatttga 146340 
tggttttctg tttttcagca aaatacaata acgtactttc tcgagcagaa tttttacacc 146400 
aacatttccc attaagacca gtttgtttag ggaattttta agctacatct gtatgtaata 146460 
attttttgag attccaaaga ctacgcagtc taataaaact ctaatacttc aactatcttc 146520 
agactaatgt ttataattac ccggtagatg accaagaatt gatatcatct gttgattcca 146580 
gaaattatgg cagagaaaat gctgtcagga acccaaagaa aatcagagga aatggtacct 14664 0 
ctaagaaatt ctgaatcttt tctactaaga tatgtggctt gactgcttaa ccccaaaatg 146700 
cctgcttaga aggtagtttg gggctatctt gtaatactca tttagttcct gccttcttct 146760 
gccatagaaa caacatgcag aagcagcatt gcttacgact cacactgaac ctgaagggat 146820 
gaaattacat atgacgatgg aatgtggcca tattcacgca gtcacagcag tgtgttgccc 146880 
aatgacagta ctggagcagt ttccacagag gcactcatgc aatatgcaga atacagacat 146940 
tttacacaca cacttacgat ggtccttttc attgtcgaaa aggaattcat tatctttcga 147000 
gtaaacatgt gctttgaggt atataactct gaggtataga agttagaaca tttaacccga 147060 
ttagggtgac tggaattata acctttaact aatgtgagat atagtataga tcttgataag 147120 
tgtctttctg gtgttcctat taaaattcat tataattacc gttcctgcaa ttgtgtagca 147180 
tcttacagtt tccaacaccc tgtgctagcc atcatcttat ttgaaacaca taataaccct 14724 0 
acaagttcac tgatgtaggt aagaaaactg gtaccgtttc tgaagataca cagtgattgt 147300 
ttcggccagt taattaaggc aagagatcac tcaacaattg ttctacagtt attcctgctt 147360 
tttttttttt aactcactca ttaagtgaaa gaagccagtc tgaaaaggct gtatatttta 147420 
tgattccaac tgtatgacat tctggaaaag ggaaaactga agatagtaaa aggatcaggg 147480 
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tttgccaggg gttaagggga agaagggctg 
agtgaagctg ctctgtgtga tgctacaatg 
acccacagaa tgtacagcac cagatgtcaa 
atgtagattc atcagttgta accagtgcac 
aggtggctgt gtgtgtgcca cggagggggg 
tgtggctgtg aacctaaaac tgctctaaaa 
atatgaaaag gaacaagtaa agcaacaaca 
accagtaaac ctagccagcc ctcactaggg 
ctagcacacc gggagaataa cttcagaggc 
cacacgtcaa cattttttta aaaattagat 
tcttattttg gtatagaacg cctttattca 
gccatagttt ggctaaattc ctgaggctgg 
tctcaaaatt tttttttttt tttttacctc 
gggcctacga ggccctcagg cagaggggaa 
tcttgagcag aaaagcgaat gtcagacggt 
gaggatggca ctggctgtgg atttacatgt 
tgctgttgaa tataactacc aagatatggt 
gtagcatgtc cagttggata ctgttagtga 
aaccttcatc cctgaaattt gccggaacag 
aatactgagt acttcagaca gggagatatg 
catattagac taatagtagt cttatcacca 
tcatcaagag tgttaacatg ggagtaagtg 
gtgacaaaat tgagaaatag accctacaag 
aatggaaagg ctggctgctt gcttcctttt 
gccaattatg attataattt atcagcccac 
tcaacatgat aaaataatcc atttcccaag 
aaaagctcca ttgtccataa aaaattataa 
ttaatgtata tttttaattt ggttgttggt 
tatgaatatt ttgtggtggt aagttgtcag 
agacatgtgg aaagttgctc agagggagaa 
tcagagcagc cctcggaggg agcgggagag 
ttgcaaagag aaaggtttta gctggttgca 
gttctcagga gttttttaat aggtttcaca 
ggacagaaaa gaaaggtgat ttcatggaga 
gatggcacac ctgacctaga gtccaggcag 
aaaccacaca taaaacagtc attttaattc 
cactactcat gatttttttt actcttttta 
tttcttttta agcattaaca taatccaagt 
gaatgtgaga tggacaataa caatcaaacc 
gccacatgga acttccctga ggctgaattt 
gaaatccagc gtttccccct gtcaacttcc 
gtaagaatat cccatgtata cttcctcttg 
gtttgccctt tttttttttt ggagacagga 
tggtgccatc ctggctcact gcagcctgtg 
atcctcctga gtagctgaga ctacaggcat 
ttttggtaga gacggggttt catcgtgttg 
gcgacccacc cgcttcggcc tccgaaagtg 
agccacataa atttgttttt agtcttctga 
ttgcaccaat tctattacaa ggtggaattt 
cagttgcttg tttttgcttg ttttcctagc 
tggcatctca cctatctaga ggtgagaaca 
aagtctgctg tggggacttg gtatctcagg 
cgtctcaaga tcgatgtccc agtgggcgag 
accagtgact taacttctca ggctcacttt 
aaagaggtaa tttaaaatat gtactatata 
tacatatagg cattaaatat taagaatgtt 
tttacttttt gtcattcttt gtatattctc 
gtaataggaa tagcaaccat tgagaaatag 
ttccgtgtag tgaccaagga cttaacatca 
ctccctttgt gctgtgttta atttctcatt 
ctagctgctg ctacactaca cgttctgacc 
ttgggaattg tcttttttta tttttttttc 
ctggagcgca atggcaccat catagttcac 
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tgcaggtaga gcacagagga ttcttagggc 147540 
gtggatccat ggcttcatac attggtccaa 147600 
ctgcgggctc tgggtgataa tgatgggtca 147660 
cactctggtg caggatgttg atcgtagggg 147720 
atatgggaac tctctgcact ttactctcga 147780 
aacatagtct tttaaaaaat catttactac 147840 
acaaaatgtt attgtgtact ttcagattgc 147900 
tcttctgatg gttacatagt taaaagtaca 147960 
ttgctggtct aatggtaatt gcgtcggctt 148020 
tttcttgaat ctgatcatgt ccaagatacc 14 8080 
aacaacggga gaacatgaac atatcccttt 148140 
ctggggccag aaacaaaatc cctgaaatgg 148200 
tccccttttc cttctggttg gtggtctttg 148260 
atggcagttt ccccatcccc ttttgggact 148320 
ccttataaag tcccacgtga ttcagccact 14 8380 
aagacaactt catggcgtat tttcgccttt 148440 
ttgggcagac aaaatagaaa tcttctgtgt 148500 
catagagaga cgagcgcaca actcaggttt 148560 
tcataatgaa ggtgctaatg tatttcctga 148620 
ggtggtatct agtagccttg tgataagacc 148680 
gattaaacca cctggatagc ccacctcaag 148740 
tgacaaatgc ccaggtggtc tggactaaat 14 8800 
atctggattt taaaaagaga gaaaaaaaaa 14 8860 
aagactttgt tcacgttctc gcccccaaaa 148920 
aggaaatgat tgcttctcta tgagacatcg 148980 
atttctatat cttagtatct catctcttta 149040 
aattacatat ttttacatga caggtaattt 149100 
ttttaaaata gtaaaatatt aaatatcaac 149160 
gttaatgtaa agattccaaa aataattcac 149220 
ccagtctgat tttggagaaa gtaattacca 149280 
tccacaggtt tcaatcaggt tctagatgaa 14 934 0 
ggaggggctc tggtaaaagg attaagtcca 1494 00 
tcttttgtca actggtgcaa ggaaggatta 14 9460 
aatatctaat taaaatatta aagatagtcg 14 9520 
tggtaggcag agttccttcc cctttttttt 149580 
caacaaatgg ttcatactgg tattctaaac 149640 
tttacatcaa atcattcaac ttcacatcat 149700 
gccaggccat ttttggtgat ccaatctgta 149760 
gttttcaaac tctaatagtg ggaagagaag 149820 
cgtcgtcctg cctttcaagt ggtgtcctgt 149880 
agaacagggc tgtaactaga tgtatggttt 149940 
gttataacat aatttgtttt gcggggggtg 150000 
tctcactgcg tagcccaggc tggagtacca 150060 
cctcctgggt tcaaatgatc cacccacctc 150120 
gtaccaccac gcctgggtga tttttatatt 150180 
gccaggctgg ttttgaactc ctgagctcaa 15024 0 
ctgcgattac aggcatgagc cactgcaccc 150300 
acgattaaat agttgtacca attataccaa 150360 
cttatcgttc ctttacaaac aggatattcc 150420 
agcttcagca ccatcctcac atagaagggc 150480 
aagctgtgct ctcagcaatc ggaatctgtc 150540 
cctgatgctg gcctaggagt gccctgcact 150600 
aattgctgcc aagactaacc aagggtgtca 150660 
tttttttatt tttaataaaa acaaattgtt 150720 
ataagtacta cagcatatac agtgtttaca 150780 
tatttcagaa tcatataatt atacctgata 150840 
cattttttgc agtatctata tattacttgg 150900 
ttctaattga ttttcctttt ataaaagggt 150960 
tccccacccc acagtccctc acacgcctga 151020 
tcattcattt acccttctgt gcagcacata 151080 
aaagcatagt gtccccctgg ggcaagactc 151140 
attttttagg gtctcacttt gttgcccagg 151200 
tgcagccttg acctcttggg ctcaggcaat 151260 
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cctcccacct cagcctccca agtagctggg 
ggtctcactc tgttgaccag gctggtcttg 
tggcctctca aagctctagg attacaggtg 
cttccccctc tgacttcttc taggcaccta 
ctcgataaat attttttgaa caaataaatc 
accccaccta ggaggggtgg gcggggtcat 
gagaaataac gcacttctgc ccaaattcat 
gcagtcttcc cagccccacg gaaaattctg 
tcccgtgttt gctttaccgc tgggcaatcc 
ggtgttagca tgccccgtgt ttataaggga 
atacctacca tacttgtgta ctagagatat 
agtgctttaa ggtgactcag agccaccctg 
tggctgcaaa gttcagtggc tgtctttata 
aattctcatt gttaggttcc ttcaagccaa 
tggctgtcta gatgggccct gtttaattag 
aatcttacag ggccatccta tagaacaaat 
tgtgtgtgta cacacacaca cacacacaca 
tagaatggtt tgcctgctga cttgccatta 
agctaattgg aaaacagtct gtccgtgttc 
aactcaggaa atccacaaag ctgaccaggc 
agatataaac cacctccttt cttccctgtc 
cctcatcttg actgactccc tcacaggtgg 
acgcaccttt agttacctca gtctttcaaa 
cttccaacct ctcgtcatgt catttttggg 
cactgtacgt gttttaaaaa gaaggaacgc 
aataaagggc tggtaaaaaa atctctgagt 
gggaaagaga atctacttcc tatttccacc 
tgcccatatc ttactttcat aacatttttg 
agagttgatt tgaactcttg tttttcaatt 
gattaaaata agcaccccag accatcctga 
ccttctcaag tttcatctac ggtaactggc 
ttgacaactg taaagagcca cattgattaa 
gactatatgg attatctagt gtctcaatag 
ttctaaactc tgcaagcaaa caatcatctc 
agtatgtcag acaaccccat ttagtaaaca 
atatgtataa tatataatcc atatagagta 
gcacatatca tactatacat atacatatcg 
tttccaaaat taaatatgtt gcagttcccc 
cttactgcat tccttactat atagtaatac 
caggatgtag ccagcggata cagtaatggt 
ccagtggctg tgaatgggat gggatttttt 
ttattgttgt tgttttaata tacagctatt 
cacacacaca cacacacaca cacacacaca 
ctagcgggct gggttcccct gggagcccct 
atgtttgaag agcataactg catggtttcc 
tatattatta aaaagataca ctattattac 
atttacactt tccagcctgt tcttgtgttg 
cagaggaaat gtttgcctcc gtagtaggca 
aacgtgttca tagactgcag ttgtttatag 
ctttcctttc catatttcct cttatcagaa 
acaatccatt ttgctaaatt atttgtgagt 
cagaggttca cactagttgc aggattagca 
cactgcagtg tgttttgtgt gctagcgatc 
acacagtgag taagctgtcc tgtattgttc 
cctgacactt cctttgccgg acagattaaa 
aacctgcttt cttagtctaa gctccctagg 
atttatttgc cctaaaatga accagaaact 
taatttccta aaagtgtacg gattttgtag 
gctgatccag gagagaaagg agatatggaa 
aatatgtaaa accatgaaaa acccagaatt 
gtgggtgtta ggttgtaatg ataacccttt 
atattatttt cttcctcatg agaaaatcag 
agaaaaacaa acagtgaggt tacatttaat 
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accacagctg cgtgccattg tagagatggg 151320 
aactcctggc ctcaaagtgt cttcccatct 151380 
tgaggcactg tgcctggctt taggcatttt 151440 
gaaccaacac tgcctggaca tgtgaaggca 151500 
aacttgcatg gctcctgccc caaactggaa 151560 
atggtgttca ctcacttacg ctaatgaact 151620 
gttcattcac actcctctca gcagttttct 151680 
cttttgtcag aggaggggat atgcgtgctt 151740 
atacaaggct actaaactgc agagggtact 151800 
cttaaaaaaa tatacaggct tgcatccacc 151860 
tctcggggca aaatgaggtg aggtgtggaa 151920 
ttgcgattgc tgccttcgtg atgactggtg 151980 
tcagaataat tctagaataa tttaggagaa 152040 
aggaggatgt agtgaaaaga gaataggtgt 152100 
agtcgactgt atcagttgcc aaatgaagcc 152160 
atatattttt tatatttaat atgatatata 152220 
cacacacaca tacatatata cagggagaga 152280 
agtaccgtaa acatcctgga aattgtgaac 152340 
atgattcatt gtatgcatcc tctagatctc 152400 
cctgctgtca ttttgtggcc agatatggaa 152460 
aaaacagttg tgccacgtcc tccccctctt 152520 
tgtctctgtc tctcctgccc ctgcccccac 152580 
ttttgctctt tgttcctaag tacagtcttc 152640 
ccaggaaaga tcctgattat gctataatgc 152700 
tgtacatttg atattaaatt tggcatttta 152760 
gctaatctcc aagaaaggga tggaagactg 152820 
attttaatag cctgacatat ttttttacct 152880 
ttttattttt taaattactc ccatggcggt 152940 
ttaaatgtac aaaatttcaa ttattttatg 153000 
gcatctgatc accaatggta agaccattat 153060 
ttacagataa acttgtggat tacaacctgt 153120 
aatcagaaga ttttcagagt tcagtattta 153180 
aaggtaaggt tatggaaatc catttcctag 15324 0 
cccatagtgt gatatctaaa tagttaatcc 153300 
aagactactt gaccatagaa aacatatgat 153360 
aacatgtatt atattttata tactgtatag 153420 
cataagagat acagtaaact atatttgtat 153480 
taccatagtg aaactgtctc ttctacattc 153540 
taacactgag cacaatcata tttcaccact 153600 
tcttgtcctc cgcaggagga ccacgggaga 153660 
tctttcctct aatgaaccaa gccctgggtt 153720 
gagtgttttg tagccacaca cgacaacaca 1537 80 
cacagagtcc ctagcaaggg cagggtgggg 15384 0 
caccatccgt ttctcccagt gacggcagct 153900 
tatgcattca ttcgtgagta gtagctctca 153960 
ttttaaagaa agaaaaggat tgcaattcac 154020 
tttaaaaaac aaacaaacaa aaaacgatgg 154080 
tcaactttat ttttcaaatc attctgtttt 154140 
gtatgaggca ctcatcagtg tgaaatagtt 154200 
aaaaaaattc ctgtggtctc ctagcaaaat 154260 
ttttataaag tgtgtttaat atcaccaggg 154320 
agagagacgt agcatgagta gtgtttggtc 154380 
atgagtttat ctgatccttg tttaactact 154440 
cattcatatt cctctgagtt cattcagaag 154500 
ggggcagcgt gggacctttt gatgatgtga 154560 
ctatgctgac cactcagagg ttgaactact 154620 
tggtcttagt ttccttcctg acacatgttt 154680 
tgggttgttt ttgaatcttt catttttagt 154740 
acattttttt caaaaaatag ctcaaaagaa 154800 
gtgctgctgc tttctgtgct aattaaatca 154860 
aactgtgtgg cttatctctc attccatttt 154920 
tgtttattat cacaggtgac aaaacacagg 154 980 
cactttaagt gggtttcatc tttgcttttt 155040 
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tgttttcatt cccaagccag aagccgtaaa 
gtgcacgttg ctgagatagg ctgggagaac 
aagtcattct tggggaaacg gtatttagct 
attgtgctac tgctgttcaa atattgacta 
gcctaacata gaactcgttg tctttttctg 
aaggggtagt ttacctaaga aagaaatatg 
aatgtaagtt aattaagcaa atacaatgta 
atgtttcctg gctgtttctt cgaggagatg 
ctgcagtgca tctttcatgc ctcgctgtga 
attaaccatc agcttgacgg tttacaaaga 
ggatatagtt tgccttccac atatttcaac 
gtaattattc acacagctat gaattttaga 
ggaaaggagg gaaacttact ggagcgctta 
gttttatgtc acatgaaatc tacatttgat 
aatgactcct cttgacacag taaccagtgg 
ttaagctcat gactcaaata ataacttagt 
gaggagcacc gattaggctc caagatccgt 
aggtttatgg tgaaatgaaa gagtgagaaa 
aaatgaggct caccaacttt taaaagactt 
catgtaagca ttgtgatgta atggcatcat 
tgtttaaact tcctttagaa tatatatagt 
gattttgtat aaatagctcc catgtttaca 
cagagctttc aatatcctat tttggttacg 
tcatctgtga aacagtggaa tcaccccaag 
ggcatgactt gaacataacc gtcccacgtg 
gacgggggct gcagtgctga atacctctgg 
acggcaagcc cttagtggta gggccctgag 
aatttcctct cgtttaccaa gagtcacaac 
taaagcactt tactccatcc gttatgcctc 
ggctcacggg ctgctgcggc tcacagcctt 
tccctggtga tgggtgttac tgagcttaaa 
tgtgcaggga agtatatttc ctctaccttg 
ctgtattgac tctggcattc tttccaaata 
ccctgggatg aataaaagaa attatctttc 
ctctaatcct cttcatcctc cttctttttt 
tgttaaaacc tgagctgctg ccaagctgat 
tttttttcct accttcatta gccactgagt 
cagcctctgc accgagtcat cgtattcgag 
gtaggggctg gaggaagagc ggcagttgtc 
ggacccatgc tggacctgat attgcttctt 
gttataggct gcggccaaga caagatcaca 
tctttcttca gtaataaacc agcagcttag 
cccgagctgc tgccgtctra aaygcagggc 
gaaagtcttc tctttcctct ttttccagta 
ccaggcatgc agtaaactgt cagattgcag 
gctgtgtcag cttttacaga gcagctttca 
acagtgtcag tatccgaatc aatcactttc 
gccagacatg tgtagtgggc tgctccgccc 
ggtcactagc caggcaacag gaaaaatcag 
atgctgcaga aaccctctgg gaaaaaccga 
gcaccacccc aattaagtat ttcctagaga 
tttttgccgt gctaagctgg caaaatatct 
gaaagttcct gttttttcct ttaggaataa 
tatttttgtt attacaaata aagaggaaga 
agagggaaaa atagattcat tgattttagt 
agattttcat ttccgaacag ttggatagaa 
cagtttctga gctctgacct ttattcttta 
aatacactgt actgattaag ttcattacat 
gtgtagaaag aggaatacaa actttttttt 
tcgctctgtc gaccaggcta gagtgcagtg 
ttccgggttc aagtgattct cttgcctcag 
gccaccacac ccggctaatt tttgtatttt 
tggtctcaaa ctcatgacct tgtgatcgac 
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ccgagcgaga gtgcaaattg cctttctcag 155100 
aggtgtggag cccgtgaaaa gataaacatt 155160 
agacagctga agacggactt ttgaaatacc 155220 
agtgaacctg gaaaggaaga aattttggtc 155280 
tctttaaatg ttatctcaaa gacccaagag 155340 
agctttgctt atggagtttc aggtatacct 155400 
gcagccttgc atttggccta gcattctttt 155460 
acctgcctgt cgggcagatt agaatattta 155520 
ctctgtaacc acggtggatg tgggaaagcc 155580 
aataggaagt tcaagttaag cagatattta 155640 
ctgtgttgct gcatactttt taagcttagc 155700 
agatgtttaa aagcaaacca cagtgacctg 155760 
gccaggagct taaaaagaca ttgctagtga 155820 
aggtcatttt ggtaagtttt tgttgtttta 155880 
tgctgggaac attcattcac attcattcat 155940 
cgtttcctct ctgaaggtag gggaggtaat 156000 
tctgagattc agataaggtg tcctaacaaa 156060 
ataattgtgc tttttctagg gtcatgcgtc 156120 
tacatagctt tagataatca cattccctgc 156180 
catgctactt aacaattaat ttatgcattt 156240 
ccatataaag aaaattccag ggtcgttttg 156300 
tgtgaaaaaa aattatttat gaaagaaaaa 156360 
tctccataaa aactctagga aacagtggga 156420 
aacaaactgt cagacagacc gtcctgtcgt 156480 
gggacgcatt ccgcaccggt tgctggaact 156540 
gacgcttggg aactgtgccc ctgtttacag 156600 
attctgagaa acataaggtc tgctttattt 156660 
ctattttagt aaataaattc aggaaattgg 15672 0 
ggtcatcagc atggttgtca cggtctctct 156780 
ccctcacttg cctgcaacca gctgagagcc 15684 0 
cgatgtaaac aaacagaacg gcacacaagt 156900 
ttaataaaga tttctaactt tagagatttt 156960 
attattttca ccccggggac tacccacaca 15702 0 
atttgagggt accagcaacc cgctctccag 157080 
tatttttttt tttttttttt tttttttggt 157140 
cttaatagca tgttcacaaa gacagatgga 157200 
gttgttttcc atgatgttct ccagcacttg 157260 
cggcgcgtcc ctctgcacag cattggacac 157320 
catctctggc aggaggaaag tgtagctgca 157380 
tcctatgctg tccatgctct tccgaaagtt 157440 
gctcagagta aagaaaacaa tctgccacat 157500 
caaasttgag ggcaaacaca cgtccagagt 157560 
tgctacgctg ccatggctgg gtccgtcart 157620 
gcaaacctgg tttttactgc tgtgttctct 157680 
tgggaagaac agtcctgctc acttgggagg 157740 
cggtcctttg ttcctctctc cccagatcct 157800 
ctttccttat atgataagtt gataagagca 157860 
tcctggctta gccgttatct tcctgtaggg 157920 
agcagaatgc ctgccctcca accaggaccc 157980 
tctgttacag gacccctggg catttcctag 15804 0 
gagcagttga tctcttttgt ctgaaactga 158100 
gaggtaataa ctttaatgtt gaagtacaat 158160 
aaatactaca aataggtcag gacttcggtt 158220 
agtttggctc ctgtaaacgt gtgccttttc 158280 
tgattcttga accactagcc aagttacaaa 158340 
agatctgtta ttaagtcacg ttagaaacat 158400 
aaaaaactcc acttggatat tcactctaaa 158460 
tacaatagag aaattagaat ttaagtgtct 158520 
tttttttttt tttttttttg agacggagtc 158580 
gggcaatgtt ggctcactgc aacctccgcc 15864 0 
gctcctgagc agctgggatt acaggcacac 158700 
tagtagagac agggtttcac catgttaggc 15B760 
tcgcctttgg cctcccaaag tgctgagatt 158820 
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acaggtgtga gtcaccatgc ctggcccaca 
atctcagcat tttgtaacac attatcaatt 
caagtcactt ccctgtttta attaagctct 
cactcacaag atttttcacc ttcaatccta 
ttaacaacta aaacttaata cttcagagat 
tacataggga aaaagccccc tgcctttgct 
ttattattcc cagcgagcgc tgaatagctg 
ataaaagatc tttatttttt actaagctct 
aatagttaac gaacaacagt gtgaatatct 
ttttatttgc tgccacaata ctcagaacta 
ctactgaaca ttggaataaa gtagcatgtg 
accagacaac caaagggttc tgtcacacag 
tcattagtag aagaggtagt atgattccac 
aaccaattaa agatgctata cccatccgga 
ttaaagatga gacctgtgtt agaaccctgc 
tcttggagcc tcagtctccc tagtcataag 
taggattaag atataattgt atgtttagaa 
ccattaaatg gggatcagtt cttccaccat 
gcaaagcagt ggcagtggca tagggtacaa 
tgtaaggtat taatatcaca tatgtggttt 
acggggtctc actcttgttg cccaggctgg 
tgcttggcct cccaaagtgc tgggatcaca 
atggttttaa aagtcattca gttgtcttcc 
tagaaagagg gcacaccaca gtactttttg 
tgcactgggg atagagatta aggcaactgg 
ggagagggag agacccataa acaaattgga 
gaaagacagt gatgcttccg ggtgcacaat 
cttttcaaat tccctcaagt tccgaacatg 
ggaaattatc ggaaccagag agtcagggga 
tgtccagtgt cagatggtaa ttattttcgg 
cactcaaagg taatgcggtg caaagcagct 
gggctccaat aaagctttat tcgtgatact 
atgaaagatt ctattttctg caatcattta 
ctgttagaaa ccgggcctgg actgtagctt 
ggggtaggag gtgctgggca aggccgtcca 
ttggcaaggc tgtcagcagt acacacagtg 
atccacagtc cacacagtgc aataaccgtg 
ccgtacagcg accgtgtgtg gtcagcaagg 
cctgtggtcg gcatggccat caacagggca 
ggccgtggat agtgcccgtg gtgtggtatc 
catgtggtgc ggtgaccctg tgtgattggc 
accgtgtgtg atcggcaagg ccatggatag 
gcatgaccat cgacagtgct tatggtgtgg 
agtgagtgca cacggtgcgg tgaccatgtg 
gtgcggtgac catgtgtgat cggcaaggcc 
tgtgatcggc aaggccatgg atagtacacg 
ccatggatag tgcacgcggt gcggtgacca 
cgcggtgtgg tgaccgtgtg tgatcggcaa 
cgtgtgtgat cggcaaggcc atggatagtg 
aaggccatgg atagtgcacg tggtgcggtg 
gcatgcagtg cggtgaccgt gtgtgatcgg 
accgtgtgtg atccgcaagg tcatggatag 
gcaaggtcat ggatagtgca cacggtgcgg 
agtgcacacg gtgcagtgac caccgtgtga 
ttctgtgtgg ctgcttacag gggcttacta 
ccacactgaa gtgaattaag gatgccaggt 
gcaagtgtgc atggatgggc gctcagctgt 
caggccactg accaggaagc ctcggggagc 
aaagcctgga gttgtgggtg tccagtaaca 
ccatctggga ccccttcact ctaaatgagg 
aaaatcttgg gtgttcctcc ctgctgtact 
cagatgagaa acattattac aggttcccaa 
ctgtaatagt tctgctgaac agctgtgcat 
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aacttcttta ttgtgtcaga atttgttgac 158880 
acattagtcc cccttggtat tagactcggg 158940 
aatgttctca tctgtgcaat tcaaggggtg 159000 
tggctctgta agttctacaa gtcacttcct 159060 
taataatatg ttaactcagc agcccaagtg 159120 
gcggtttgtt tatctctcaa ggtacaaggt 159180 
gtacactgac ttaacagacc acatctaccc 159240 
aaccgaaaga cagcctttcc cttatcaatg 159300 
gtgactttct catcctcaga aatcagctct 159360 
catttttatt aaacccagcc ctagatcttg 159420 
tcttcttttg agaaggtgtt tataggcttc 159480 
aaaagctgga agacatgctc tggaaggatc 159540 
caaggttctg gacatggttt ccactaaggg 159600 
cagtgcaccg tcgaagaaag catataggtc 159660 
ttctgtgtga cctccagcaa atgcttccat 159720 
atggagatca ttttttctct gtagggtttt 159780 
atatttgttc cttttcttta caggcatgct 159840 
caaatagtat aactctgcta ttctctgaat 159900 
tttttttatt tcctgtttga aaaagcatat 159960 
tacctttttt caagattatt ttttgtagag 160020 
tcttgaactc ctggcatcca atgatcctct 160080 
ggtgtgagcc actgtacctg gcctgcatat 160140 
aggcaaatag agtagtttaa aaggaacaaa 160200 
tctaccagcc ttgtgtcaga caccatgcta 160260 
gtgtctgacc taaagaagct aatagtgtat 160320 
tgtggtaaga gagagcagtc ggagcacaaa 160380 
atgccatgtg cagctggcat ggactccatc 160440 
gaaggaacag ctctttgtag attcctaaat 160500 
agctctagta gagccggagt tggcaaactc 160560 
ctctgcaggc tatacggcca ccatcgccac 160620 
gtgggacagc attaactgag tgagcgtgct 160680 
gaaatttgaa tttcttgtaa ttttcacatc 160740 
aaagtttaaa aaccattctt agctcatagg 160800 
gctggcctct gagagcctag gtggtctgtg 160860 
cagtgcacgc ggtgtggtga ccgtgtgtgg 160920 
cagcaaccac cgtgtgtggt cagcaaggcc 160980 
tgtggcgagc aaggctattg acactgtacg 161040 
ccatggagag tgcacacagt acagtgacca 161100 
cacagcacgg tgaccatgtg tgatcagcaa 161160 
catgtgtgat cggcaaggcc atggatagtg 161220 
aaggccatgg atagtgaacg tggtgcggtg 1612 80 
tgcacgtgtt gcggtgacca tgtgtgatcg 161340 
tgaccgtgtg tgatccgcaa ggccatggat 1614 00 
tgatcagcaa ggccatggat agtacacgcg 161460 
atggatagtg cacgcggtgc ggtgaccgtg 161520 
cggtgcggtg acgatgtgtg atcggcaagg 161580 
tgtgtgatcg gcaaggccat ggatagtgca 161640 
ggccatagat agtgcacgcg gtgcggtgac 161700 
cccgtggcgt ggtgtccatg tatgatcagc 161760 
accgtgtgtg atcggtaagg ccatgataga 161820 
taaggccatg atagagcatg cagtgtggtg 161880 
tgtacacggt gcgatgacca tgtgtgatcc 161940 
tgaccatgtt tgatccgcaa ggccatggat 162000 
acaggggagg actggtgcct cggctcagcc 162060 
acgggataga ataggtgctt agagaaagtg 162120 
ggggagaggg gccaggaagt ggcctgggat 162180 
ggcccctagg gaagtggaga catggtctgc 16224 0 
caatgggagg cgcttgaagg cattaggtgc 162300 
ccaccatgca gcctgggggt ctgaggccac 162360 
cttgactagg gggatctcag aagtccacag 162420 
gacgggacca caagaggcaa gtgagactgt 162480 
aatccacctg cctacccacc caatttttgt 162540 
agtgcaattt atttccttaa tactgtttgt 162600 
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tttctccccc atattctgtg tcggcaactg acatttcaga ggttcccatg tgttctctgt 162660 
ggaactgtct caagttctta ttaccctggt tgacgacacc agaaaaacca tagctaccta 162720 
ctcccagaaa gaggccagtg ttacaaagaa tctcgtggcc agcccttttg gctcagtttg 162780 
cccagttgga ggccctaagg cgcaaaccag aaaagccaaa gggcctcctg aggaccgtgg 162840 
aagtgggtgg cgcgtggacc catcgctagc tgaatgtgga atgtggaccc atcgctagct 162900 
gaatgtggaa tgtggaccca tcgctagctg aatgtggaaa aggacttatg acagtcagac 162960 
catcccaggt tcccccagag caatccgtgc agctctcata agcaaccaga aaccaaaaaa 163020 
ggatgctaag tcagcacaaa gtggagcagc cccccagcta tgggttgcca aacagaattt 163080 
gcttgtgggc cccgtgaccc ctgctgttgt ccagtttaat gctcagcatt tatccagatc 163140 
aagggatgga aatggggcca ccagcctgac ccaggcccgg ggtcgttttg cttttccaac 163200 
ctgtaccatc ccagcaatgc attgccagcg tgcaatttga aaaagccctg ccgagctgaa 163260 
aaacacatgg gaagggctca gacacactta aaggcacatt gctgccctgc atttatacgg 163320 
cattttgtgc tgacatcgtt ttccatcagg cctgggcagc ccctcctgag actgtctccc 1633 80 
gcctgccgtc ctcagcacgg cctgcccggc tacagtctgc tttcctccca ctgcccctgc 163440 
ctgcaggcct tggaggcggt gactgctgca gacttatttg ggcagcctgg ccttaatttt 163500 
tggaaagtgc cttgttgatg tatgaggaac ttccacggct gaaacagtct aaaaaaatga 163560 
agctgggaca ctatgttttg attttagcca tttgcagaca gaggggcaca ctcgggactc 163620 
ttgggcgcct ggcacactaa gctgggaggg acttttgaga catcttggcc atctaaatca 163680 
gtcaacatgt ttatatatac aatttaatgt tcagtataca gggaaaacca ttagaaggtt 163740 
agctgcacat aaaactgttg ttaaagttat ttttattact tccccccaca aatcgtatgc 163800 
aataattaat aagaactaga gaaatagcca caactggcac aacacctgcc cctctgccaa 163860 
aagaaaaaaa tcttctttct gaaggcaggc tccctatata gtgattcctt tatatgcctc 163920 
ctggaagatc tgtttcgact ccattttgat atatgttgaa ccagatttga agacccacaa 163980 
atgcagtcta gagccatttt gcaaaagtgt tgctgcatca accatttcca ttccccagtg 164040 
ctgctcatca tgttacacta gtgttaaatc ctgactttgg aatgcgagga aggacagttc 164100 
cagccatggg atttcaaaaa agtaccaaag gaaagcccct tcaagttacc gttaagacag 164160 
aagaaaagga agaaaaatat aaacacacac gtataaacat gtaaggtagc tttggtccct 164220 
ataacagaca aggaaatcaa ggctccgtga agagagagac aagaattccc ttagccaagt 164280 
gcctgtgtgt gtctgtcttt tatgttaatg gttatgaatt taaggagaat tgaaagcaat 164340 
aattttgccc ctctttaaca tggcaaatac agcctgcttt agagatgatc agcaatcacc 1644 00 
atttagtact ggccgtcacc tctgtgcagc . acaaacacac atcccgagtg acagaagcca 164460 
tttcactgcc agagactctt agcggccttc agttctcttg agctggagcc actgggtctt 164520 
gtatgaaagc tcaccagaca tctcatgtgg acctcgggca tctgagccgg gaccatccta 164580 
ttacaagtgc ggaaaccaga tcattaatgc agagctgaat tcaaattgtt acttgctagc 164640 
ttaggaaaga atccttggaa atccaacata ttgtctaaat ggatcagtta atcttactat 164700 
gtgcattcta catacccttt cattgtttgg gcttaaataa cttttctgct ttgtctggtt 164760 
taatttcatc caatgtggat cgctggaaga atatgatgta tgttttagaa tagaaacagt 164 820 
tctgagatga agttgagcac aatttcctgt tctagttgca attaaatata aatatagcat 164 880 
ttgacataaa atagctggcc cgatatattt agagtacaag ttaagtgtca tccccttaga 164940 
attgggcatt gactccgtag aattcccctt tgtacaaggt gagcaaatgt atattttgtt 165000 
aaaaataagt atctgactgc caaaacggac agaaagctct ttgccatatg tgttttcagg 165060 
ccatttcctt tcctgggaaa cagccatttc ccccgcatta tagttgtgtt ttcatttgcg 165120 
ggtagataga gtaagcgcag gagttaaagg acgcgggcct ccacagccaa ggccttatct 165180 
gggacaatta tctttctcct tgcagctgtg taacttctgt ttgacacaga accacagaaa 165240 
ccctgttagt gggaaggatc acagttaata ggagaaaaat cttcattgtt catgagactt 165300 
ctcaggtgct tggcattctt atttaggtgg cttaaaaaag ttccaagtac tcattcattc 165360 
taacttatct gtgttcattg tgaaatcgtg tgtgaatgac atttggagca gatggattgt 165420 
tgtttttttt tttttttttt tttaacaaac ttaagagatt cccgaatctt tcacagtttg 165480 
tactaccgca aaccagcata acatctgcta aagaatttca tattttaaag ctgcactgta 165540 
catcatatgg aaccttaagg actttgaagg gaagagcttt ttatttactg gtagcttggg 165600 
aaatatccaa gtaactattt tttaagaaaa aaaaattcct tgagttttta gaaatagttt 165660 
atataactgt tatgctgttt gatttttaaa tattttcatt ctctagtatt attatggaat 165720 
attttatctt cccatcaaaa aaatgccaga aggtcaagat agaagtcaca acattaaaag 165780 
ggagtggata caattgtaaa acaatagatg agtacatttg cctgataata tttttgccag 165840 
taattctgtg tcctgttttc tccctgtaga atgaaatgct aaacattttt ttcaatggat 165900 
tgatgtcagt gtttactaac atgacctgtg ttaagtcaaa taaagtattt cctttgacaa 165960 
acaccatatt tcattagtgg ctttgaggtg ggcttatttg ttataagtca cattaaatgt 166020 
tcccaaatcc atttcataaa tgttgtcgag atctcaaact ccgttgcttc taaaaaaata 166080 
tgtccagtct ctttgtcata accatcctaa taaagatcta aatttcttag agtgaatttt 166140 
catttgaaag tggcttaatg ccagctagat taattcttgt ttaatctaaa tttataaaat 166200 
ttttatctta attattgaga aaccttttta aaaagagata aaaatgtcat atgtgctatt 166260 
tacattaaga tatattatct ctcttggtta taggttaaga taaataaaat tgcttatgtc 166320 
aaagaagtaa aaaaaagtcc atgacctcct tttggtatcc ccatccatct ggcggactta 166380 
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atatgaaaaa atcttcctgt gggaaattag 
tagtttttga agaattataa taaatagtta 
tatataaaaa tattccttgt cactcttgtc 
ttcaagtctt tcctaataat ccagaatgta 
ttcaagcatt ttctttggta ccccattaat 
ttcaagtaag gggtctaaaa gtcaaagacc 
caaagactga agagaggatg cagtattaaa 
aagaaaaaaa aaaatgagta acgttttttg 
acggaaaaaa taatatgcgc tctcctccca 
tatttataga tcaaaaatct tgcttgacta 
agaatgcaag tatcaaatcc acttgtaata 
aagcgacagg caaatctatc catgattgtt 
tttaggacct acatcaattg caactaaaat 
atgcaaaagc acatcacacc atatacagat 
aaaaatattt ttaggcattc atttagcata 
aggtgataag gttactacta ttacaacagc 
taaggagcca atttagagac ccaatcctgt 
acagatgtcc catctggatg cacaagcact 
gagcccccac acgggaagcc tcccaaacca 
ttcactgttt atttagatcc acactgttac 
tccctccctg aagaaagacc aaaactgagg 
tgtgcactta aaaaataaat aaataaaagg 
ataattgagc caagagggag gagatgggtg 
gaagagtctg cttaggaatt ggggttgtac 
tgtcctgatg tgactgagaa aggactgtgg 
tgtgagtaga ggcagagaga ccagagcaga 
tagtggcctg tgctagaata ttaacatcag 
ggccttctag atggatggaa tacagggagt 
tgactcttac acttaaaaat gctaaaataa 
gtaaaatgac aagataataa acacatattc 
gtgaactatt aaagagcttt gccatcaagt 
cttcaaggcc tgagtggttc agacagaaga 
tctgatgaga tacctccttc caggaaggct 
cctgcccacc ccactctcca gagcagacaa 
aaagagggga gcccatccct gagaatttaa 
acacataagg tggccagaaa aaaccacaat 
atcttattcc cctaggtttg tggaagaagc 
ttcatcatgg gtctcaaaac attcttacaa 
aaaaataaac acacaagaaa acatagtgct 
gtaaaaacag accagccaaa aacatctatc 
acaatgtgcc acaaaccggg tgcctaaaac 
ctagaagtct gtaattgagg cgtcacaggg 
cttccctgtc ccatcctagc ttctggtgtt 
tgtctacacg gccgtctttt tataaagatg 
tccaggagga cctcctctca actagttaaa 
acatgctgag ttgctgtggc ttaggactta 
ataacactta cgatgtttaa agatgaaagt 
aaactaaaga agaaaatctt gcagatttaa 
tggttggaat ttataattca gagtcatgtt 
gataagtgaa aacgagtcat cccaggtgta 
gaaagtgaag gaagacctgc aggatggagg 
cagaagaagg gagagagaac agggcaggca 
cccaaagtga tgaaagatat caagttacag 
gataaataaa atgatgagat aaagcagtaa 
tttctggcct ggcagctgca cagataggat 
gagaggcaat gcatgcagct tgcacacatt 
agagttttcc tattgcatat ttgggtctga 
caggaagacc tattttcctg attaatctca 
tattaaacag tatatgaaac aggtgaagaa 
caatgtgggt ttttctagac aaagttaagc 
ggccacggtt tcacaggtgc aggttctgca 
tatggctcta gaagagataa tgataaagaa 
gtggtgtcta tacaatgttg caagtgacta 
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gcttgattat agagttacaa gtacaaaaag 166440 
cacataaaag gaagtgatgt ttgcttgaag 166500 
ccctcatgaa tcttagttgt ctgatgatgg 166560 
tccctccact ttttctctta aaaacgctat 166620 
aataaagcat acttccccaa aatgttccat 166680 
gactgataca aaagagaaaa gtaaattgta 166740 
cgtaccaagt tcttgacatc ggtttccctc 166800 
aaagcctgaa actattctag taaaatattt 166860 
aatcctgatg cgcatttaaa tcaccttttt 166920 
caataaaaat taaaaaatgg tacctattta 166980 
ctcactagct ccctctgctg atctcctatc 167040 
attacaattg ttaatggaaa tgataggtaa 167100 
acaagctaca atgctttcat tttaatttta 167160 
gttaaagacc gacgtgcaca cacacagtga 167220 
catagaccta ggagctgtct ctgtatcctc 1672 80 
agaaaaagag gtctgtactg tctgtctcca 167340 
tcaccccaag cttacagtct aacgaggtga 167400 
gctggctaag gccctgggta gtgcaggagg 1674 60 
cgtaagggct acgtgaacag caagaatagt 167520 
ttatttaaga agaacatact ctgccctttc 167580 
gaaattatat tccaggctga gaaaattgcc 167640 
cgagaccacg gaagttaaaa taaattaaca 167700 
agtcggagat gcggtctgga actagctgct 167760 
cctggacata aagcatttgg ggcgggggag 167820 
agtgctgtgt gcagtagggc ttagaggagg 167880 
agctgctaca gtaattcagg ttagatatta 167940 
gctcatgatg ttaaagaggg gtgatcaata 168000 
ccaggaatgg gattggtctg gtactgggac 168060 
aataacttgg caatagtttt taaagcatct 168120 
tgctcaaact tatgtgaagg caggaatcct 168180 
catttgccaa acctggccaa cttaagccta 168240 
caaaggccag gacctaaaga aatgggagca 168300 
ctaccccagt gtcagggaag cagaagtaaa 168360 
gaaaacatgc ctggatgtta aacagaacta 168420 
ctacaagctc acccttttgg gttttacagt 1684 80 
gaattgttct aaggtggtcc caggctgatc 168540 
aaatgaaaat cctttctggg agaatgcact 168600 
tttcccaaga ataatgggca actcacagac 168660 
attagcaaaa atcagcagaa agaagaaata 16872 0 
cctgctgtat tggtttgcta aggctgcggt 168780 
aatgggaatt tattctcaca gctctggagg 168840 
ccatgctccc tctgaaacct gtagggggtc 168900 
gctggcctca tcgtctcatg gtattctccc 168960 
cagtcacatt ggattaagag cccaccccac 169020 
cctgcagtga cgctacttcc aaataagccc 1690 80 
catctttcta tcaggaatgt aattctatcc 169140 
aaaattttga aaacacctaa aggggacaga 1692 00 
aatgaaaact tatagacatg aaaaatacga 169260 
taacaagatt agacacacct gaagctgaaa 169320 
ggacagagag acgagataag gaccaggaga 169380 
aagagtgtct gacaaagtcc atccaaattc 169440 
atatccaggg agatagtcac tgaagctttt 169500 
attcgaaaaa cggcaaaaaa tgacaaacag 169560 
gagggaagtc aagaattact ggttctgcag 169620 
tctcaagggc tcggaagggg agaggtagca 169680 
cagtttaaat tggctatgag acttccggtt 169740 
gctcaagaga gccacctggg ttggaagcag 169800 
atgccagcct cattacacaa tcttaactaa 169860 
gaacagctgt ataaattgca taaagcttag 169920 
agcaaagcag ctccattatg agggaccctt 169980 
gatcatggca tgttgtcctg ttctctggat 170040 
gacccagggt ggtcagtaaa aaggtcctac 170100 
aaaatgagta aaacttacaa gatataatta 170160 
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gactaattat caaagaaaga ttgctttagt 
gaaattatgg tggtcagccc tccgctcggt 
cttgtgttgg cagagctgga ttgtctgtcg 
aagtgttttt ctggggcgtt ctaatgaaaa 
aatgtgactg tagttagtat gtagtgtttg 
ttgtcattgc agaaataaaa ttaacccctt 
tttgcatatt atactgagaa aataaaaaga 
gctctttaca gccccaaagc aggtgtcgcc 
gcaagggaaa ctgaggcttt ctcgtttatg 
ggggctatgt cgccaatgcc acagtcacac 
tactttgacc accattgctg ggccaggacg 
ctaaatgtgg ctgccccatc cctggccctt 
ggtagtcagc ccagctctca ttgactcagt 
atgcgctggg acaatgggaa gtatcggtag 
accaaggttc ttgggctggg gaatggtctg 
caaagggaaa ggtctcccct gtactcacga 
cctccgttgc agtgtaggtc agcccttcgc 
tgccctttgt tgtgttaaac tgaaagaatt 
gcctgtaatc ccagcacttt gggaggccga 
gaccagcctg gccaacatag tgaaaccccg 
catagtggcg tgttcctgta attccagcta 
cgggaggcag acgttgcagt gagctgtgat 
agcaagactc tatctcaaac acaaaaacaa 
ttggagaagg actcggacaa atgtcatatt 
aagacttccc tgagggccat gatggtagtt 
tcttattcct gtaaataagt ttgcatttaa 
tagtcaatga ttatggtcag ccctccacat 
atcgtggacc aaatatattc aagaaaatga 
aaatcgagta caacaactat ttacatagca 
tagggatgat ttaaactatg tgggaagatg 
atataaagac ttgagcatcc atggattttg 
aaataccaag ggacaactgt gtattatttt 
gtggaatgct aaccatgtgg gaattattta 
ctgatttttc acacacacac agaattgcaa 
ctaacatgta gtttccatcc acaaatccaa 
aagagaagct gttgaatcat gtggtgaata 
caatcattct gttattcctt tttaaaaatt 
atgtaataca gaacaatatc ttctgacatt 
gttaggaagt tacatgaaga aaacacccag 
tttccatagt cttagagaaa gtttaaatta 
tgtgttcagg ctgctgtaac aacatatcat 
tttatttctc acagttctgg aggctgagaa 
tgagggcctg tttcctggtt catagatggc 
ggtgagggag ctctctgggg tcccttttat 
cactctcatg acctgctcac tttctaaagg 
ttaggcttca acatgaattt gagggaggcg 
attaggtaac ctgcagtgct tggctgtggg 
agtgggaaca ggggtctcaa gctgccttca 
ttgcagctgc aacctactgt gcctgtaaag 
tgctcaataa atgttaactg ttattatggt 
aatcctcata gtaactcttc aacataggta 
ttagcaaagt ttagtgacgt tgcagagcta 
catctatctg tgtatttgct tatttaacct 
cttgattcag cacacgttct cttcattgat 
attttttgtt tgtttgtatt taatagtaag 
catcatgaga gcgctggtgc ccacctccac 
cttccccgtc caaggaaacc actgtctgga 
tggctctttc cagcctttta tttctctact 
cagtgtggtg cctctgagct ctgtggcttt 
ttcagcccca caagtgtcgc agcttttcaa 
tcttttccat gaagtcttct ctgggccacg 
gttctgcacg tacttactct gcacatagtc 
agttacatgt ggtcctgttc tatagtcaac 
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gaatgagaca gtttagatcc attcccttgg 174000 
ctcactttta gataccagaa actatatgtc 174060 
ccctctggtg caatcctgca ttagtaaggg 174120 
gtgcttaagc atttgttttg gtgcccagat 174180 
gactttttgc tcatgctttt gttgttgttg 174240 
aatcttatgc ttaatgtaca caccaagtgg 174300 
ttgttttaga aaaaccaaag gacaccaaca 174360 
agaggtcaca ggaggggttc ttagttatca 174420 
cagaagtgga atttattgaa taatattaag 174480 
tgcccacaca gaactggcct ggcgaggtgt 174540 
ctgccaccaa ggccgtgccc ctgccagaaa 174600 
tctgtcagta gggtcaggtt caaactcctg 174660 
ctgaacagct gcctgttccc tagaatccac 174720 
acgctatggt gggaagatga ctctgtgtcc 174780 
agcatatgac ggcctcagac cccagccaac 174840 
agcctccacg atgtccatca gcactttctt 174900 
agatgctcac aattccctga tacagccggt 174960 
tcagagttgg ggccaggcat ggtggttcat 175020 
ggcgggcaga tcacgaggcc aggagttcaa 175080 
tctctactaa aaatacaaaa attagctggg 17514 0 
ctcgggaggc tgaggcagaa ttgcttaaac 175200 
catgccactg cactccagcc tgggctacag 175260 
aaacaaaaca aaacaaaaaa aaactcagag 175320 
atagaggagg aaaaagatcc aggaggcaga 175380 
agtgcatcca ttaaatacaa gtcttctgct 17544 0 
catttttgta cattaaacgt tactgattca 175500 
ccgcaggttc tgcatctgta ggttcaacca 175560 
aataaaaata caacaataaa aaagtacaaa 175620 
tttacattgt attaactatc ataagtaatc 175680 
tgcataggtt atatgcaaat actccatttt 175740 
atatccaagg tgggggtctt ggaaccccac 175800 
cataacccat ttctgcctag tgttccatta 175860 
tatcctactg ttcaaggtca tcaccaaggt 175920 
cctccagcat aaatggggat gaatttacta 175980 
tgtccctatg ctatttgtaa ctgtggagcc 176040 
tgatcaagaa ctcaagatta gggataaaag 176100 
attagcctgt aatttaaaca tcaggatctc 176160 
tttacaatac tagtattctt acaaaacaca 176220 
actgtgtgtg gctaaatctt tagtacctca 176280 
tattgaaact tttctcaact gctatcttaa 176340 
tcaaactggg tgtcttataa acgatagaaa 176400 
gtccaatatc caggcagatt ccatgtctgg 176460 
gccttctctg cgtcctcaca tggcagaagg 176520 
aaggacacta atcccatttg tgaggatttt 176580 
caccacctcc cagtactctt gcattgggga 176640 
caaacattca gaccatagcc actggtcaac 176700 
atgggaagcc tgtgttgtaa aggacgtctg 176760 
catctaacgt cagcacacta gagatggaca 17682 0 
catttagaat tacgccttgc atacacaaag 176880 
tgggcatcag ccactttaat tatctctttc 17694 0 
gccttatttt gcagttgagg aaactggagc 177000 
gagttcaaac ccaagtctga ctccaaagtg 177060 
cagacacaca gaatcggatt aattagagtc 177120 
ccttactcct ttattttatt ttttaatgct 177180 
ataaacactg tgaactcacc acttacctct 177240 
ctccgagttc cacatatccc attaccctgc 177300 
atccttcgtc attcaagcct tttcacagta 177360 
gtttcgcttg gaaactctac atttctaaga 177420 
tgctcctgct agcctttctt cataaagtct 177480 
agcctttccc atcctttaag gtcctacttt 177540 
atgactgggg aatcctcact gtcttctgaa 177600 
ggcggtgagg tattcatcac attgaaatcg 177660 
caaaactcct ggggtaaaaa tgctgctttt 177720 
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catcttggca atctctatcc taaccagcac 
attttctttg tttttttttc agagtgattt 
gcagcacaca tttcagtttg ctttatgctt 
gatacatgtg cagaacgtgc aggattgaca 
agactcctca gatgttataa ataatacaca 
acttttcttg ctctgcaagg cactggctgt 
gccatctggg atgtgcatgt tatgtcagca 
acatcaagaa tacaataaaa caaagtggac 
acctgaatga aggacagagt tgttggaaaa 
cctggagtag tgagggcaag gtgattgcaa 
tccccacatc ttataaatag ttctccaagt 
cgcctcagct aactggatgc agaagagaca 
tttttttttt ttttttaaga catatctttg 
ttgcattttt actactttca agtgggtgga 
catacaaatg tgattgtgct tcgaaactcc 
gcaacatttg ccatctacag cactctcttt 
ccacagtcac ttcccagaaa cttgctaatc 
aatttgtaat caaaagtcac atattgattt 
tacagattta gcagctcagg gaggaaggaa 
ctgggatgtg aattcctcct cttcatgaaa 
agtgaaccat aaaaagctga aagttaatgc 
actgatttaa aaggagacac agcagatgga 
atttttttgt tattttaatc tctgcttatc 
cttcacatct ataataaact tggtaccaga 
gatcattata accgggggag gaaaaaagtt 
ctcagtgttc gctacacgtc acttaatctt 
tgtagtgact tttcatagtg actctacaat 
atgcagtcat catctgacaa ggtttagcta 
tcttttgcca aacaggtttg gtcaaactgt 
gattaacaaa tagcaaaaca gagataatct 
taaaatatag gcatcctcct gttgagtcga 
agcagtagat atcccctctg atccccatcc 
taagtataaa ctattctcta gcagttcggc 
taatattttc agactaagac agtgtctctg 
atttatcagc aatacaaaga aagttctccc 
gtctcactag tgcacagcca taaaagacac 
gtgcccacca cagcagttgt cccaggagtt 
actgcaggga gggagaagct gtgtgtggcc 
cacagctgtc tgggagcccc ttcccgggaa 
ctggagccgt tgtcggccgc ctcgaaaaca 
agccctcctc ccaaggttta cctctctaaa 
tttccctcta ctgaaattta tttgtgacat 
ctggattccc atactcaatt aaatatcctt 
aattacatgt aactggtctt tcctcctctt 
tgtacttcct tcatgaccca ccatcccatg 
ggtttctatt gcacggaagc tggccaacag 
ttgtgttaac ccaagttcac tcacagctgt 
agccatcctt cttttattga gtttttgagc 
tgcaatgatg tgaacatggc tyactgcagc 
acttcaacct cctgagtagc taggactgca 
tttttttttg tagagacaga gtctctctat 
tcaagtgatc ctcctgcctc agtctcccrg 
gcccagctca tccttccttt aaaaccggca 
agtttctcag accactcagg gaagctagtc 
tgccccacct ttaaggctgg tccccagggt 
cctgtgacta gcctctgtgg tcaaaggtgc 
gcagtgcgtg tctttatgtg gaggagaccg 
ccactctgtg ccggtcatag ggcttagaac 
gcctgaaata aaggcaaaca gtgagagacg 
tatcaaggga caaagctgaa aaaggaagac 
atgggccgcg ggagcccttg aaagtctgtg 
tctgttcagg acaggttgaa gggatgagag 
ctggctttag tttaaaccac ccgtaatgta 
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agtgcctcgc tgaatattag aggcctgaga 177780 
ttttttctct gctttatttg atactttgaa 177840 
gatttttttt tatttcttct aaacaaacga 177900 
caaataatag ctggcagagt gtcctaggaa 177960 
aacaaaaaca cacacaaata tttactgaag 178020 
gtgatgcaga ataaaaccga caaaattctt 178080 
cagggaagag atcaagtgtg tgtgcatagg 178140 
aaaaggaagc gagggtggtg aacacaggac 178200 
ggacccctga gtgcccaagg gaggagctgg 178260 
atgaggcctt ggtgattgga aatgagctca 178320 
tatccgaggc aggttattct gtggcaaaga 178380 
actgaataga gcctcatggt ctcggagtct 17844 0 
gcattttgta cctaccttct gttctaaatt 178500 
ctttgttgtg gtgggtagtt caagattcat 178560 
caccagtctg acgcacgcat gggttttctg 178620 
gatcaccttc atcatcttcc aacattcctg 178680 
tgtaatagaa accctcagat tcctatggtg 17874 0 
caaaatcaat acacacttta aaaataacac 178800 
accgtaagtt catctggtgc agctacccgt 178860 
tgtttacatt catatcacag tctagggttt 17892 0 
aaacagaagt cgcccccaaa acatatacca 178980 
gattattgtg aaaagaactc ttactggaca 179040 
ccaattcttt tagctgcata tactgagaca 179100 
acacaattca ttccagacct aactctttta 179160 
aaaaaggctt atctatctta agaagtattt 179220 
ttccaaaatt tgacaatata caaagcagtt 179280 
aaaatgggcc tgtcctcctt gcttttccaa 179340 
tttggggaag tccttgcttg caaacgtagt 179400 
gtcccctagt tgcacagtta ccccatattt 179460 
cagaaatatt caagagtctc aaaccccaaa 179520 
attggcaatt ttgattagca aggctcatga 179580 
cagtgcgagg gcacagtgag ttgtattttc 17964 0 
tggagtattg ggagcaaaac tgtatttttc 179700 
ttttctggac ttttccgtgg caaatgaagg 179760 
agtgggtact ccacggggag aggagctggg 179820 
cacaagcata ttacacgtga agcaggatcc 179880 
tcctgtttga atgagacact ttgggtggat 17994 0 
accacagctg gaagcgtggc ctggtgccct 180000 
cgccggcttt tcccgggtgc accattgcag 180060 
tgcagttggg ctgctctggc aggcttctcc 18012 0 
tgtcaaaagg gagagaatac tgtatttgtt 180180 
caggcatcac tttcacctta gtcattttgg 180240 
ccttccatat ggcccatagg aagagagaga 180300 
tataaagtct ggtggctgag caacttggcc 180360 
actgcagggc agttttaaac acagcagctt 180420 
tcacagtgtg catttttcta ttgcacctcc 180480 
aactacagaa gtttttctga aagcaagtga 180540 
tagggtctca ctctgtcacc caggctggag 180600 
cttgacctcc tgggttcaag tgatccttgt 180660 
agcatgtgcc accatgccca cgctttctga 180720 
gttgcccagg ctggtcttga accactgggc 180780 
agtcctggga ttacaggtgt gagccaccat 180840 
gctgggcaat aatacagatg ggaccaacta 180900 
ttgcatagac aaaatataca ccctcttacc 180960 
ccgcgctctg tcctccagcc tccacgcttc 181020 
ttgctgatgc agcctctgta cagcctccat 181080 
cccttctttc agcagttatt gagcatctac 181140 
tgcatgtctg gggggaattc tgcaaagaga 181200 
gccaggagaa accatgagca ctgcagtgag 181260 
tgaacgctga gcttcaagcc attcatttct 181320 
ggcaagtttt ggtgagatta agctggtagt 181380 
attaggacac ttaccacctg aatcctgtcg 181440 
gacatcctga cttagaattc cctgtgctgc 181500 
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ttcctttctg atggaaacag ctctgctaac 
tgcaggcagc ctgcaggccg cagtttcctc 
cagcgctagg gacctctgct tccacttctc 
tcgatgatct gtattgactt ggcctgagta 
atagaccaga agggtgtctg atgccgcttt 
aatgactcac tgaagtctgg caaatacaga 
ccatgttgaa ttccaactgg gtgtcctcag 
tagaatctgt tttgttttgt cttatttcag 
ggtataattt ccatacaata gaattcaaca 
aattgtatac agttttgcga tcacccctac 
ggtgattgct ggacattgaa ttctttccag 
ttggcatgga tttgtatgta gctataaatc 
ttggggcctg gatggcttat tgtggtctca 
atgcttcttt ataagaatgc agtattactt 
gaaagtccat gtggagaccc tgtccagaga 
aaattagcaa tgcccaaaag cacatggagg 
tgcagtttat ttgtctgagc tatctgagca 
cagattccaa tgcagagtcc ttagagctca 
catttgtctt aagctgtctg aagtcagctt 
ttccactttg ccaagtgagc gtatatgggg 
tggggtggct gacagtgtcc acagggctag 
gggaccactg tgcctcctgt cccctccacc 
gctgctctca gtggggcatt ctttttccag 
taaagcaagt caccgttatc agagcaagaa 
tttaagtgta aaatgggact gcaacaaaaa 
aaaatgtact ttatgattgc gaggaatatt 
ttcacctccc tggggacatg ctaggatggc 
tgtgtccagc cgggctcctg gagtggcgta 
agtccatgtg aacgcattga actgttaaac 
cacccgctgc tctcccctgt cctgcaggta 
gttacaaaat tattagcctc tccatagttc 
tcctctctcc tctctgccca ggtctcacca 
tgccagcacc acgacaggca ggctggaggc 
gctttccatc ctttatagtc tacctgctac 
ggcaaggcag aaattgcagc acagagcgaa 
gcttttacaa gtagattttt cttaaacaat 
taaaacacct ggatgatgaa ttgacaacaa 
cggggaagac agtttttttc tgtggtgata 
gaaggcattt tacagagctc accttaatca 
attttttcct tcttcaaaaa cgactgatac 
attgggacaa tgatagtgag tggagaatat 
cttttattac tgaggatatt gacatgaaaa 
cggccgattt cccaccagaa attccaggct 
tcacttggag caccggcatc tggaaatgat 
gaagacaagg cagtggggaa gggaagggtg 
aggcattttc atcttccatc tcgatagaat 
aatagctggt cagccaaaag ttttctgctt 
acgtgctgca accaaataca ttatgtgtaa 
ttatatagtc atgggaatgc ttgtatacct 
atacagcaga atatatcaga taattctgca 
aatagagaag tctcgattca taaaagacta 
attaaaaaga catggcactg tccccgaaat 
cttcattttg aaactttgca cattagcacg 
aggaacattt ggtgccggaa gaatgattag 
ttgcagagcg ttttcaagag catggaagag 
aaaaatacag atccaactat gtaatcatta 
aggatgtcag tgaggatgga gtggagggaa 
ctcatgcatg aagtgaatgg aaacatttgg 
ttgtaaccaa aaactacacc atgaacaaaa 
gaagctgagg aggttggcag tttctcaaac 
gaatctgctg ctatttgggt tctggttgac 
aaccaatttt tgttataggg cttgctggat 
aatagcttaa gaaaaacata aaggaaaaaa 
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agagtgcagg ctgtgggagc cgagccccgt 181560 
ggcttaccac ccagcgcttt tcattcggct 181620 
ggtgttggaa attgccattt atttttgctg 181680 
tgcgtgcacg tctctggtgg tctgaattat 181740 
tataaaaaat aataataatt tgaaaggaaa 181800 
gccctctctc tgaatcgact tctcacttgg 181860 
acatttctat cccaagatct actcctggct 181920 
ctcatggttc ttgttccccc agctttatgg 181980 
ctttcaatgt gtggttggat ggcttttggc 182040 
actcaagata tagaacactg tttctcgtct 182100 
ttttcactgt tatgaatctg actatgattt 182160 
acttggtaat ttttcagaag aatagcagtc 182220 
aaaagttcct gatgataagg ttgcagcctc 182280 
gcaagggagc ttgggtagat aagaaagcaa 182340 
gcacagacat ggactaagtt aaaggatggt 1824 00 
agatacttcc cctcctgact ctattggtga 1824 60 
agtttcctct cacttacgtg ctggggacag 182520 
ggctcccctc aacctgacgc atctctcaac 182580 
cccatcttgg ggaggtagaa gtgaaagggt 182640 
agactgaggg tgtggagttg atgatggttg 182700 
tcttgaggca ggctgacact ggggccagat 182760 
ttctcctagt ccaggaaggg aatagcagca 182820 
agacaggcca gcccagcagt gatcccttga 1828 80 
ctatacattc acttaaaact tttttttttt 182940 
gaaaattgtg cttaggagaa tgtccctcag 183000 
tgccaaggtc tttggggtag gctgagcccc 183060 
aagagaggat cagacatctc ccagggaggc 183120 
agtctggttg aaccagcact gaactgcctg 183180 
cgtgtctctg gcggccacat ctccgggctt 183240 
caaagtcaat agtcaacctc agttttgaat 183300 
ttcccatggc ttctcaccca agccttctgc 183360 
gctgcccttg ggccaggtca ctgcagtgtc 183420 
ccagttctca cagaaaagac tcgaaagggg 1834 80 
ttataggcca ccaggacaaa ggatcaaggt 18354 0 
tggaaaggca gtcactgaag ggattctttt 183600 
cactgtatga aaacaaaagt acaaaattat 183660 
gagtttttct ggaacatcct cctgtgggct 183720 
gatggtcagg aaatgtagtg acatagaagt 183780 
atggcttttt cacttattaa gttttctttt 183840 
cttaatttat gggaattgtt tccagtaaaa 183900 
ttatatgcta tacttcctgt cttccttcat 183960 
caggatcttt gtatccaatg agttcatcga 184020 
ttctgacatc agcgtgcatt gctctgcatg 184080 
gaaatcctga acaacaaagt ttgttttcag 18414 0 
ctaagcttca gtgactgcct actgtgtgcc 184200 
gtctaactgt gctctgagat gagaactata 184260 
tttcttagtg atctcaagtg tttccatgac 184320 
attgccaaag acctgttgat ttccaaacca 184380 
gaattgtcat aaaattgatg agatgcgaag 18444 0 
gaactcttat tatggaaatg aaaataattc 184500 
gttttactct aaagtatcta aaagacatgc 184560 
gatcttgctg tgttgcattt caaatggtac 184620 
ttctttataa tagcaaaaag tgggggagtg 184680 
gtaaagcaca ccaagctgaa aaaagtattt 184740 
tgttataatg ttaagtgaac aaaaaaaaaa 184800 
cacatagaaa taaaaatgag caataaagcc 184 860 
tgtcctaaat gtgcgttggc ccatcatcac 184920 
tttatgtttt ctggaatgtc tcataagcca 184980 
agcaaagcag gccctgcagg ccctgggtgg 185040 
tcatgtcaga tgcccctcgg ccactagaca 185100 
cagaggccta atctggaatc tggttctaaa 185160 
acaaatctgc aatgagacat tgtcacaagc 185220 
taataataag tttttggaaa taagcctgga 185280 
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aaagcagttt attgccatct gctaactcat 
tgatggtgat ctctgcttca aagatgagaa 
ctcacaggcc agtggagact ggcttcagac 
tcttttgttg tttgtttgtt tttgtttttg 
tcttgttgcc caggctggag tccaatggcg 
tgggctcaag taattctgcc tcagcctccc 
acacccggtt aattgtgtat ttttagtaga 
ctcaaactcc tgacctcagg tgatcagccc 
ggcgtgagcc accacgagcg gccaaggcct 
tcagtacgat gagtagttaa aatcactgtc 
tctacttatt gtggttttgg ctacatccta 
tcttgatctc atatggcatt tgtagacaat 
gatttctcac tgctggcccg tgaagccatg 
tggaatgaat ggtcagtgga acataggcat 
acctgaatgt cctgaaatgc aagagtatcc 
tctttctggt gtgctttgtg ggttgggtga 
gagaaacaaa taatttcctt tcctcctgct 
cccaagacac tcagctcctg cactgcattt 
acaacttcgt ctaaccccct agaattcctc 
tgaaatacaa aaaaaggagt tgaaagtgag 
cactgtaaaa ttccttactg tatgtttaaa 
tatggaacag aagtagggaa aaaattcgac 
agctctgtta gagtatcatc actaatctct 
aatagagaac aatacaatat agtgtaacac 
ttagttatca ttatggttat tattattatt 
ccaaactgga gtgcaatggg gtgatcctgg 
tgatcctccc acctcagcct cccaagtagc 
gctaattatt tttggtagag acagggtttt 
tggactcaag caatcctccc accttggcca 
tttttgtgtt tttcataaat taaatcttgt 
agttaaagct ccctttgcat ctttccctct 
taactgctgt cctgaattta acgatgattt 
cttttttttt tttttttttt tttttttttt 
gaaaaggggt ctcactctgt cacccaggct 
caacctctgc ctcccaggct gaagtgatcc 
cacaagcacg tgccaccaca cctggcaatt 
gacgaggtct tgccatgttg ctcaggctgg 
tgccttgtcc tcccaaagtg ctgggattac 
tgtttttctc ttccccctac accccaaata 
attgctgctt caaggccgca gtttggacac 
ttttttttta gacagagttt cgctcttgtt 
ggctcactgc aacctctacc tcccgggttc 
agctgggatt acaggcatgt gctaccatgc 
ggatttctcc atgttagtca ggctggtctc 
tcggcctccc aacctgctgg gaatataggc 
gattaaggct gcagtgcaat ggcgcgatct 
caagtgattc tcctgcctca gccttccaag 
cctggctaat tttgtatttt taataaagat 
cagactcctg acctcaggtg atccacccac 
tgtgaacccc tacagctgac ccagacacca 
tggttgcggt cttaggcacc cttataaata 
tttaatgagg tcatgttata aaattgtcgc 
ccttgcaaga tactttcttt tgcaaagatt 
cattgcttcc ttcttatgca aaagtggccg 
ctctgtcagg tagacagaag catggataga 
tattgtctct tctctcctat aacctcgtat 
tttttgttgt cattggtgtt ttgacagctg 
agaggtttgc tgagttgagc aggggtgtgg 
ccccgctccc cgtgtgccca ggtcataccc 
acagcgtttc tattaaaggg ttctttgtgt 
ggtgagagtg gaatgagtgg tttacccagg 
ggaattttac tggagttttc actgcagtgc 
tcccttggag catgctaatc tttccaaaac 
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ttgattcttg cagtaaccct agggtaggta 185340 
aatcgaggct gcctcaggtc acttgacctc 185400 
tcgggccttt ggacctcaag gccctggtct 185460 
tttgtttgtt ttcctgagat ggaattttgc 185520 
caatctcggc tcactgcaac ctccgcttcc 185580 
gagtagctgg gattgcaagc atgtgccacc 185640 
gacgggtttc tccatgttgg tcaggctgtt 185700 
gccttggcct cccaacgtgc tgggattaca 185760 
tggtcttcct atcgcatttt gacaccttgc 185820 
attggctaca tgcctacttt ttatagtcac 185880 
gttgaactct agggctagtg tttattaagg 185940 
cgaatgttga gtgataagcc ctggtaacgt 186000 
gaaatgttcc catggaaatc accccatgtg 186060 
ctttctctcc tgtcctctag gttttaaaat 186120 
taagagcact ttagaaatat ctttgcggtt 186180 
ggtaccgtat tccaggacac gtggccctta 186240 
tcagtgttat tggtaaagtg ggaaggtagc 186300 
ggatagaagg gcgttcaaat tccaccaggg 186360 
attttgaccc ttggcatact ctatatttgt 186420 
tctatctata tgtagtaggt atatcgtgtt 186480 
atttttcaga atacaatgct gggggaaacc 18654 0 
aacgcaaagt gagagtggga aaccatgtga 186600 
tttttcctta tacctatatt catgaaagca 186660 
cgtgtaccca tcactcagca ttgctcaatc 18672 0 
attatttgag acaggacctt gttctgtcac 186780 
ctcactgcag ctcaacctct cgggctcaag 186840 
tgggactaca cgtgcgtgcc accacacccg 186900 
gccatgttgc tcagggaggt ctcaaactcc 186960 
attttaatat tttattatag ttgtttccat 187020 
aactattata tatttcacag aatattataa 187080 
ccaattccat tcttcctctc tctctaaaag 187140 
ttaaagtcat ctaggctctc gtttttcttt 187200 
tgttgctgtt gttgtttgtt tgttttaatt 187260 
gaagtgcagt ggcgctctgt gggctcactg 187320 
tccaacctca gcctcctggg tagcagggac 187380 
tttttttttt ttttttgtat ttttggtgaa 187440 
tctcaaactc ctgagctcaa gtgatttgcc 187500 
aggcgtgagc caccgtgcca ggccggctct 187560 
aacacagagc tttattcctg cctcagtcaa 187620 
tatgtttttt agggtgtggt tttttttttt 187680 
gcccaggctg cagtgcaatg gcacaatctt 18774 0 
aagtgattct cctgcctcag ccttccaagt 187800 
ccggctaatt ttgtcttttt aatagagatg 187860 
aaactcctga cctcagatga tccgcccacc 187920 
ataagccacc aaactcaact tataatttat 187980 
tggctccctg caacctctgc ctcccaggtt 188040 
tagctggaaa tataggcaca cgccaccacg 188100 
agcatttaat tatgttgtcc aggctagtcc 188160 
ctcggcctcc caaagtgttg ggattatagg 188220 
tgtttttatg gctggatttt gtctttgctc 188280 
gagctttgaa gagaacatta ccaatgtatt 188340 
ataggacttc tcaagaaaag acagcctctt 1884 00 
gagatcattc cacaacaata gacctctgtt 188460 
tccctcccat cagaaggacc cccgctggca 188520 
aggctggtgg tgagctccag gtgccttccc 188580 
aaccttcctg ggttttcctg ggtgcatgtt 188640 
gctgtccagg caaggctgct gtgtttgagc 188700 
ctgcagggcc tagcctggcc tcccaggagc 188760 
aaacaggagc attccttatg ctggtcctgg 188820 
taggaatgtt cagcagagcg ccatgagccg 188880 
gcacctctgg accctgggag tcacagctgt 188940 
agcccagggt aggacacaga gggcttccac 189000 
actcattcgt gggccctcat agaagctcct 189060 
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agggcattac caagaaatag cagtccttga 
gaatttagat ctcatgtgtc catgttgctg 
tcaggccaga ttgtttggag tattgccaac 
cgctacgaag acttcagtgt tctcctgcag 
gattggcctg tcacttgcca tctctcattt 
atacatatgt cagaatctaa attacagcat 
caccaggctg gggaggacct agtttaaagg 
cttcaggggc tccataatgg ttttaaactg 
ttttgactta tgtattaacc aagaacctct 
gacaggcaac ttgaggttga gtctgttgct 
ttgtctttgg gaacctcgct cccctatcag 
cattcactga gcacccactc agctgcgagg 
agtgagagag gcaaggatcc ctgtcaacat 
aggaatgagt gaatggaata agcaaatagg 
aaacatggga gatgggtcag gagtgtggac 
agaaggtgaa tgagttaaaa gagatcagga 
ttccctggag agggagctgg ccagctgcat 
ggggaagagg aggcagcagc acagtcggga 
gagatgcggg agccctgcag gtttgagtga 
catcgttcca gctgctgcag gtttgagtga 
catcattcca tctgctgcag gtttgagtga 
catcattccg gctrctgcac agggaagagc 
tagatggtaa tagagtcaga tacaattggg 
atcagaaaag ttagggtgca accatttgtt 
ccccaaaaag gagcctgtga gagtcagatg 
catgagacag tctctttctt tttattcatt 
tgtccttcag cacagtttgc aggagcattt 
cttgccaagc atgctgaaca ccgtaagcca 
atatgttgat cattttattg ggctccagtc 
tggaccaagt cacttcatcc ctttgggttt 
gtgcagtgag ctctggcgtt cttcttagcc 
cacagccccc tgcctgcatc atggcaggtt 
ctgctaaggg aagtagcccc atctgtcagg 
atcttataag cctgtacaca tggcagccaa 
gacccagcac aatgggctgg gcagtaagga 
aggagctcgt gagtatgagg gcatgatgag 
gggcccagga gacagtatta cggcattggg 
cacaatgcaa cgatgccaaa aaacggtaga 
ataaatctct tttctgctga aattgatagc 
gaataatcat tattgaccat gaaatagcta 
acagtaagaa tgaatgagaa aactcttgca 
tgcttcttca tgtaaattaa atcagcggag 
gagggtagga aggatgcaat ggggcaaggt 
gctggactca cagccctctg cctggtgtgt 
cctcaatgaa ggtaataaaa tcacctgcct 
atggatgagc ctcagtcctg tgtggggtct 
tttaccatca tcactgtcga cccggagcca 
gcgtttccca ccattgtctt attttttcgg 
gattccttaa agccaggact agaaggtaga 
gtcttgtaag atgaatttta tgtacttgtg 
gcaaaaataa aagacaacaa acagccccag 
ttcaccagct cggccaagag ttctgctggg 
acgacaaccc ggaacacgga gggagggctc 
agtttccctc tctttgctaa tttcttcgtc 
cttttctcat cctcacacct cactgcgccc 
ctactgtcct tccgtgaccc acatcacctt 
gggcctgccc ttcacttacc catctcctta 
cagaaccctg cttgttggtc tttctgcccc 
acgtgccact catgtctgct gaataaacag 
agccaacatc agctcagcgg gccttcacgt 
gggtgaggat gctcctggac acagaaatta 
cttctgggag gcaaaggcgg tcaatcagag 
atacgtgggt tcccacacca gccttgggga 
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tcatatccag tgaattctga aacagtgaag 189120 
agggcgtcct gggcacagag cctgctcgca 189180 
tggccttttt tctggagaag aaagtactga 189240 
gggactgcag gggactgcag gggaagggag 189300 
ctgcgatgct acagagaggg aaggggaggc 189360 
gtggaaagac ctgccctcgg ggtcagagca 189420 
gatagaagag acattacttt agctccttct 189480 
ttctttaaaa tcgaagtttt tctaatctac 189540 
tgtaaatctt aagactatat agttgtcaaa 189600 
aactaactta ctgacttcgc acaaatcact 189660 
taaaacagag atgattgatt gattcaaaag 189720 
cactgcctgc atacttggga tatgtcaggg 189780 
ggagacttca ttccagcaga ggagacacac 18984 0 
gtatgtactg taggagcaaa caagggatag 189900 
tcgagtccac agcgctaaac tgtcccattg 189960 
agttggccaa gtagatgcag gggaaaagtg 190020 
gcacaaggcc tgatggacgg ctgagccagt 190080 
tgaactggca ggtgggcttc tgctccagaa 190140 
agagtgcatg gtccaacacg gtcttcagag 190200 
agagtgcact gtccaacaag gtcttcagag 190260 
agagtgcact gtccaacaag gtcttcagag 190320 
atcaggggca agagttgatg cagacagtaa 190380 
caggcaaygg ccctttacag atgacgaagc 190440 
ttcagtttac aaaaagggaa gacgattaat 190500 
aagaaattaa gaaatgaata atatgggtca 190560 
tatttatttt tacaaaaaag tatgtttctg 190620 
agagcacacc cgtggagtgg cccttttatg 190680 
cgtgtgacac atcttccatg gacatgaaag 19074 0 
tcagctctgc cacgaactgg cactgtgcct 1908 00 
gcgtttgctc ccctggaagg taggggaggg 190860 
tctgctgcag ctgcatgagt gggtctatgg 190920 
atacacagta aagagatgaa aggaattttt 190980 
atagttggct ccattgtgtc taacgtaggt 191040 
ggggacctgg ccgccagagc cgtaggagat 191100 
agccagactc tggagccagc gtggaggtgc 191160 
gggtgcacag aggaacccct gggctaacag 191220 
ctttgtattg ccggagacca gcacagatcc 191280 
actgaaaacc ccagccagat caacgcgaga 19134 0 
ctcctaaaat gctaagacac atgcagygga 191400 
agaaccagct gagaaaatac agaaggacac 191460 
tagaggatac ggtcagagtt agcaaccagt 191520 
aatctaaaac catcccgtag accacattta 191580 
gggcaggaga tgggcttagc atccaagcag 191640 
gatctcagca cttcttgtac cttatctgag 191700 
ataagcctgc agtgagaatt agaggagcaa 191760 
ggctgctcac aaggcaccat ggacgccgtc 191820 
atggtgaaag caggacacag gcaagcccca 191880 
cttcaggaag acattagact tctaggaaga 191940 
ctccagattt tggctacaag tggcaaatat 192000 
ccaagtgcca ttggaaatac cgaagactgt 192060 
gaacccggag ccctctccca gcccagaaca 192120 
ttttctctgg gggctggtgc tgctgtggac 192180 
agcgctagga agggagaggg aatgaagagg 19224 0 
tctgggaaca tttccttcaa cagagtcctg 192300 
ctcctgaacc cactcctttc tgaatatggt 192360 
ggtcctctcc ctcataagca catcctaggt 192420 
gaagaaacgt gagctctcca aagggaaggg 192480 
cagcacttga cctagagcct tgcactgagg 192540 
ccacatttcc agatgacgat gtccttttcc 192600 
atttagttat acttgtgccc ccgctcaaca 192660 
gctctgaggc aggaaggagg aaaggggatg 192720 
tgagcaccag agactccgtg tacctgggaa 192780 
gccagggtgg ggaagagggt ctgcagagca 192840 
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agtttaggat gcagcacatg ccaagctttt 
gcagggaggg gagggattgg aaagtaggag 
agccggcgac cggccagagt gcagctctga 
ggggtggggg gcggtcgtga ggaatgtcgt 
gcagaggaag gagtgggctg ggagaggcac 
aggctttctt cccgaacacc caaggctttc 
ccccaatgtt tgctgaagca aaacctctcg 
gtgagaaggc catggccctg tgtgggtgag 
gggactaaga agagccccct gccacccgaa 
ctgccgaggg tggccaggct ggagctgcgg 
tttccaaaat gcattgctct catctcagct 
acatgcagcc ccgctgagct gggtggtgaa 
catgtatttt aaaaagcgtg aaacctattg 
tcttaaaagt cctgataaaa gtgaaaatcc 
tccaactcag gcttccacgg tcatgagtag 
aaagaagcct tgatgttgtt acgtgattac 
aaggatattt ttaaaggaat gtgaagcttg 
ttttttccaa gtgactcagg cttacttgaa 
atgacgccag gccccaactg tttaaagcag 
gaggatgttc tttgtccatt attctcaccg 
ttaattagaa tgtttctaca tttgccaaag 
attataggaa cagggcttga tgtaatagct 
attttattga aacaggaagt tcagaagctt 
gcttgttgag tcatctgttg ttggaaatag 
tttatcttca ttttgttttt ttgaaagtag 
tagagattat gtgtattttt aaaaatcagc 
ctctccttgc tcccatgtac caatttttgt 
aaaacaataa cttgcttcaa tgcaattact 
tgtattagag attaacaaag aggaaaagta 
ctaggaagcg gtcaataaag taaccttttc 
tgctaaataa aagttacaga aatatctgga 
ggttggaatt tcttgccctt ttccaaagga 
aaacctactc atcctggcaa gagtgcggta 
atagtgctta gtcagggacc cccaggggag 
gccagacctt ctctagctgg ccgtgggtgc 
aggtggaggt gtaactggta ttcctgtggg 
gcctgctcct tcagatacat ttaccattgc 
cccttggcgg cacatctcca gcttacagca 
tgaggcasgt taggcttgtg acatcacatc 
atgtgacaag agaaaaacct tacagctttg 
ttgactaaat aacactgaac aaaatgattc 
g^gQ-ttQtg gcaatgtttc aatagctgag 
gggcaggttg gaggccttca gttcctttac 
ttgaggagac attgcctaga actactggac 
tactcaacac acttccatgc accgttcaga 
gacaagggcc attacggggt caccaaggga 
gagctgagtc tccctgtggg ctgggggctg 
atgagagacc tgtccctagg cctccctgca 
tcccagctgt gcccacctgg gcagctgcac 
cctcacatct ctcagctgtc tttgcaggtg 
ccctgtgctt cccagtttct agtcatttcc 
agcaaagcct tcaagacctc tccttaaact 
ttgcctcctg gctcttcctg ccctaatacc 
cagggagccg gacgtctgtg gcgatctccc 
atcatatggg acacacacac acacacacac 
cacatgcaca tcatacatac atgcccacca 
cacatgcaca ccatacatac acatacacac 
acacagtgca taccacacac aacacacacc 
cccacaccac acacacacac acccaatcac 
aagcctttcc taattatcta aaggagaagc 
gagaaattag tgtcaccctc ttttatggtt 
aatactatca tataaaaagg gtaatcagtg 
tcttctgcta gatgatgagc ttcctgaatg 
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cagagtctca cagtcaggaa cagaactcat 192900 
gcaaagcaga agccccgaac ccaaagacag 192960 
gcctcagaca tgaggggaga agaaggggat 193020 
tgtccaggct ccacccggcc caccagctcc 193080 
acaccagaac agctctcctc ggggcaaagc 193140 
caaaaggtaa acaccatttc ccccaagcga 193200 
tgtgagccgg cgggcggctt cacgacaggc 193260 
gaagcgcagt gcggctcccc cctgcgtggt 193320 
aggcgcccta acacttcaga gagcggatgg 193380 
cttcccaccc gatgcattgc agaatgtaac 193440 
cagcgttaaa acacatgtgt gcacacacgc 193500 
aagaccctaa ttagttctga ttccttaagg 193560 
agatgctact tcctagcgcg aatacggggc 193620 
gaggcgcgcc tgggaagtgg gaatgttccc 193680 
gaagtcctct tcctaatctc agtatcttaa 193740 
ctaaaaggaa tgccttcctc cgcggaccgg 193800 
tgacaggaat tatcgatacc tttggaattt 193860 
gccattacct cggagttagt cagggactgc 193920 
agcgcggctt agtgaaagaa tgaaaaaacc 193 980 
tgatgaatga tgcttgtttt cctctccact 194040 
aaaatgttgg aatggagaca aaaacctgaa 194100 
tatttgtaaa ggaaacacaa cttgtttggc 194160 
agtacacaca agtacaacaa attctcaggt 194220 
tctcctggta gttttcccct tgatttactt 194280 
tgagggtagg aagttacaga gagattcaat 194340 
tatcaagatt aaataaagca agcgggaatt 194400 
aattatgtac aagatgaggg aaaccaaaga 194460 
aattcaaaag taaccattac tctggggaat 194520 
ctgtggtttt ctttctctat gttctatttg 194580 
cccacaggag ctggttaata gttcgcttca 194640 
gctgagttgc tggagacaca gaaatcttca 194700 
ttaggccagg acattgctgt caaatctgca 194760 
tttttaggac tcactagtgt gctacttcta 194820 
tgcaagggag agagggtccc cagcagggac 194880 
tggcctggcc acctgtagcc ctcagcgcac 19494 0 
agtgacagtg tccatctttg acatttaaga 195000 
caccattggg gattggggca gtactggcca 195060 
gagtctgagt gtctctagca tacctctgac 195120 
ttcctaggtg gggcagagac tttacaatac 195180 
tattgaaaga tttcttaagt ttttagttta 195240 
tactatgaaa cgaaaggatt ggacctctgt 195300 
caacgcagga ggcacacagg ccatcgttgg 195360 
agctatgggc tcccatcaag ggtgagtgca 195420 
agacatctca cccaggagac gggagcatgg 195480 
atcgctaaac acagcagtgc agaggcagat 195540 
ggaaataggg actggagccc ccaggaagga 195600 
gctttgtggc cctgcagcca ccacctggag 195660 
gccaccacct ggagacagag tccctagacc 195720 
tttccagagg attattcctg cagcttccac 195780 
catctctgga aaacagttct catcagggca 195840 
cttctctgaa ggttctagtt cagactcttg 195900 
gccctcctcc tcttccgtcc agccacctgg 195960 
ggctgcccgt acgggactgc tcacctcctg 196020 
tcccgccatg acacccccta cctgtcctcc 196080 
acacacacac ccctacgcac acccacaccc 19614 0 
gaaatacaca caccatacac accacccacc 196200 
aacacagaca ttaaatacac atgccactac 196260 
acacacacac acccaatcac atcacatata 196320 
ataccacata tacccacacc acacacacac 196380 
ttttctggaa agcattcccc agagcttcta 196440 
tcatagtaat gtttttatat caccagtata 196500 
taccatagta attaattctt taagtatgtc 196560 
caggctctga agaatttttt catagtttta 196620 
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aatccactgc atggaataga gaaggctctc 
attggaaggc agtagcaagg cacaaagtgc 
ccttgagcag aaagacttag gaagggtgct 
agcccgtggg gagccttaag caccaagagc 
tggagcagga gagggaccgt ccttgcattc 
ggaaatgaag gtaggatcaa gagtcccact 
caggacacct gagcctcagc agtgtctgtg 
ctctggggag aatctggatg catgcgggag 
gtggcggcgg ctgggctgtg ctctcccact 
gcttgccttt tggttcattt cctttcttta 
tgttggctgg ttgggtttgt cgctttcctt 
gatcttatca tctccttccc tcttagtcac 
tttaactctg ctcctgttct gacctcccca 
agattttatt tcattttggc tatttaaaga 
tctggtctta aaatagaaaa tatatatata 
tacataggcg catgtgtgtg cccttgagtg 
actataaacc cagggcatca gtctcctgac 
ccaaacacta gttttcagcc tgtattttct 
agtcaaatca gaaagtgatc agcctctacc 
ctctgaggtc ttggggctca taggaatgtg 
gtttcagaag ggccagctca gttttgtctt 
ggaatggcaa tggaatgcat aggggcatga 
atattttgat tgaggtgtta ttaattcaat 
atcattctta aattatacct cagtaaagtt 
tcgaaaatga tcctgtagag cgtttatgcc 
cccagggcca tttgcatttt aggatataac 
atgggatggg aatctgggga tggaagtacc 
ccttattcat aaggctgagc ttggtttcag 
ttttaagaag aggatccaaa ctgggactta 
ctctgaactt cacatcaagt caatactctg 
caggtctact aagggacccg attcccaaga 
gcaagtcagt ttcaaagata atttggtgaa 
atcagtattc aacaaattat tttattcact 
gacatatgtc tcctttgatc cgcacacaca 
ctggtggcag agggatgggg actggagcct 
cagcccccat gctgtcatag gccgcagcca 
atcgcagaca cagcttctcc acggagccct 
tcacccactg cctgccacat tccagccaac 
aggtgattca gatcagatct ctgttgtaga 
ttaaacattc cctaacaaac ctccaagact 
atgtgagagc acatgtcaga attctccagc 
ggagtctcac tctatcgccc aggctggagt 
tctgcctttc aggctcgagt gattctcctg 
gtgcgtgcca ccacgcctgg ctagtttttg 
ttggccagac tggtctcgaa ttcctgacct 
gtgctgggat tacaggtgtg agacaccaca 
aaaacttctt agaattacca acttaagtac 
aattgagtct ctaagtaagg aggagtaaga 
aacactttca aaaggaaaga aaattaggaa 
gggaagaact aaggactaga gaagtcaggt 
tcgcctgacc tctccctgtg agtctcagtg 
ccagaaagag gccaggtctc ctgacaccca 
aatggttact catgttgggg gaattttata 
gaatctgtcc ttcctgagag cttgtcactg 
ttcgcttggg actcacactc cttgcaaaaa 
actagagcag ggagtccttg cctttcattc 
atgcgatggg agtttgcatt ttggaccaaa 
acaatatgtc cttaaaccaa gatgtaactg 
actgtgcttt gagtagccag accacatagt 
agtcagtggc tgctgatctg ggccttcccc 
ttaagttctg tgaatcacag acattatttt 
cactctgtca cccaggctgg agtgcagtgg 
cccaggttca agcaattctc gtgcctcaga 
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cataaacttc ctgagtttaa atggaatcgg 196680 
agtgagagcc aagctcagga aaaccagtgt 196740 
cgctagcgag gagggaggca acaaggggcc 196800 
agggcggtgc acactttgtc tggcacgggc 196860 
tgtgcggatt tctatggcaa tgacatggag 196920 
gggaagtggc ctggcaacca gaggtgtccg 196980 
aggataggag ggaaagccag accccagcct 197040 
gaatggatgg aagggagggt gtggggctga 197100 
cacagagcct tccccaaagc ggggaaggct 197160 
atacacagca aattcctggt caccctttgt 197220 
gttgtttaca agctccaggt atttgtgaca 197280 
ctcttggccc aactctgcat attttacctt 197340 
ctctcggaag catatttgct tggtgttttc 197400 
gatgcaataa actaaatatg gcctggcaag 197460 
tgtatatttg tgtgtgcatg tgtgtctgtg 197520 
tgcatccgtg tgtgtgtgtg tggccggtgt 197580 
gtcattgctt gcactttttg ccattctccc 197640 
cagtttcccc aaaaatgatt ttttaagaaa 197700 
gccggactct gcttcagtat ccatccatgt 197760 
cttattttca tagtcccatt aacatgaata 197820 
cagttttctc actggtgatt gtgcaggggt 197880 
gtgaactttt cgggatgacg gaagtactct 197940 
gtgtcaaaat catcaaaatt tttacatttc 198000 
gattttaaaa gttaaacaca taccctttgc 198060 
tttatatgaa tttagctaat gcattctctc 198120 
tgatgatgtg gaaggtacta gcaaggaagt 198180 
ttcctgcttt cagtaagtta cataggcact 198240 
ataaataatc agaaagtagg ttgtgcaagg 198300 
gtaacgaact ctgaaactgc cacttgcatt 198360 
tatgctacaa ttccatctta cattaaaaag 198420 
aataaatgtg ctttttacaa tgcttgattt 198480 
gatatcagag ttatttttac aagattaaaa 198540 
ttgacttttt ttttttttta acctgtctgt 198600 
ccctggccag taggaaacag gcacactctg 198660 
gatcttggac cttccctgtc tcatctagct 198720 
agtggccttc cacagcccct ccatggagcc 198780 
gttctcagcc ctggaggccg gcaatgtgct 198840 
agaagaactt ttgaccgaga agtagaaact 198900 
ctccactacc ctaatgatga atttttaaaa 198960 
ctttgcttgg gtcggtcaaa atacagtgga 199020 
ctacgtttgc tgttgttgtt gttttgagac 199080 
gcagtggcgc aaactcggct cactgcaatc 199140 
cctcagcctc ctaagtagct aggactatac 199200 
tatttgtagt agagacaggg tttcaccatg 199260 
caggtgatct gcctgcctcg gcctcccaaa 199320 
tccagcccag cctactttta tactatgaac 199380 
aatagaagct tttgaaatta gctgggggga 199440 
gcaagaagat cagaaggaac cacagaatca 199500 
attgttcggt gccatccctt catttcagag 199560 
caccccgaca ggaccctatg tccctccttg 199620 
gtcctggtcc cacagcaggt gcttggggac 199680 
gccccgctct tgttgggtcc ctgaatctgg 199740 
ttcttttttc caaaagttga tatccagcta 199800 
ccctttctct cctccctgcc tgtactcctg 199860 
agcttgtttc acccaggggt gagttttgta 199920 
caatgcattc cccaaaagca gaaaagtgtt 199980 
gactccgcag caaataaatc atggaaacga 200040 
taaacctcta ctgtcttatg aaataacaat 200100 
agctggactc tagactctaa gcagggatga 200160 
agaaggatgc caagagatca agttttgttt 200220 
tgtaatcttt ttttttatga cacagagtct 200280 
cacgatctca gctcactgca acctccacct 200340 
ctcccaagta gctgggatta caggtgtttg 200400 
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ccaccatgcc caactaattt ttgtattttt 
gctggtctcg aatgcctgac ctcaagtgat 
attacaggca tgagccacca tgcctggctt 
tttaaaaaac tgaatccaca ctggtaagtt 
gtaaagcttt tgataagttc agtggctcct 
agtgttgcaa gattctggag agtactttgt 
ggcaacacaa attactgaag ccttgaaatg 
aataatatat ctaaaacatc tagcaactct 
atcaaataca attcattcca attcaacttg 
tacatcttga accaaaccat ggctttgagt 
atQQtgggtg ttcatgttct attaaagcaa 
taggttcggc agcataaacc agtgcctgtc 
tcctgccccc tcccgcacag ggagcaaggc 
gagagtgtat gtgagaaggt gtatgtgaga 
gcatcagcaa ccttaggtga tgcggtttgg 
tcccacatgt tgtgggaggt aattgaatca 
tgatagtgaa taagtgtcat aagagctgat 
gctctcttct cttgtttgcc accatgtgag 
aggcctcccc agccacatgg aactgtaagt 
cagtcttggg tatgtcttta tcagcagtat 
aggctgatgg tgttcaggac acactgtccc 
cgcaaaagca ggaaggcccc tctcactttc 
tataagatcc tcatttggga gagtctttcc 
tctgaagaca cagagcacag agaagaatca 
tttatcacca ttagctcact cccagtttgt 
tcatcaaacc taagtacaaa atacccaagt 
gtgataactc ctgagtcaca tgaaacacat 
tactctttag ttacagggaa gggccccagc 
tctttccttc cctactgata tggtttggct 
tagctccctc aattcccatg tgttatggga 
gggcagtttc cccccataca gttctcatgg 
gaataagggg aaatgccttt cacttgcttc 
agacatgctg tccaccttct gccgtgattg 
gaccattaaa cttctttctc tttataaagt 
catgaaaacg gactaataca cctaccaggc 
tcacgcccaa gaagtgggtg gagctgggaa 
tttgatccac ccccaggagg tgaggattgg 
aaggtgagct ggagcccaca gcaggactcc 
aggaaaggca gacagagctg actcacgtgc 
gaccctgatg agctgagcac agtgaaaaca 
gctggtggca ggggggctga gcaggtgttg 
gcagacgtcg gcctaaaggc aatcgcaagg 
ccatcaaaaa ggaagctcat ctttcactgg 
catttgctgc cctgatttct tgtcctaacc 
ctctgttttc ttctcctgtg ctcagtgcag 
tgacagatgt aagattattg ttagagatat 
ttctcattcg gtcctttgct gtcattagaa 
actctgacca aacctcgtat acttcaatca 
gaattgtttc aaactccccg ggtgactgcc 
ggtaaaatct tttctcagcc tgagcagccc 
actaatgcct aggcctcacc accctccatc 
ggcccctaaa acaacattct aacagcttcc 
cttctcctct gaagcatgaa gacccacaga 
ccaaattctc gcctaaaacc ccaactttca 
ataaggttta caaatttcta tgccacctat 
ttctataagt tgtatagttc tttccctgtg 
gaaaagcaac caaggcatct cggcagcatg 
agcgcatgga cgggacccag agcatsagcg 
gcgttcgagt tttgacatat ttctcgcagt 
tttatgactt tttaaaaagt tatttgtgga 
tgtagcctta aaagaaaaaa acctcaaatg 
attgagttcc agtgtcagtt gtctgaatcg 
attccgcctt cctttaactg ctagttcgtc 
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agtaaagatg ggtttcacca tgttggccag 2004 60 
ctacccccct tggcccccca gaatgctggg 200520 
tgtaaaaaat ttttaaagcc aatttgcttg 200580 
ttgttttaat aaaaaaattg tgagtaagtt 200640 
gtaggcagac aataaattgc taagtcccaa 200700 
tcatactttg aagaatatgc ctgattataa 200760 
atgaggttgt ttccatttac tcgcacataa 200820 
caaaagaaga gagtaaaaag cttttgagaa 200880 
aaaattccca acagtccgtg ttgcatttta 200940 
aaaggcttca tttaaaaacc taacctatat 201000 
ggtccctgtc ctagttggag ggaacttccc 201060 
gaccagggag tgtcaggagg atgtgctgct 201120 
tgtgctgaat ggagatattc tagtaaggag 201180 
aggtgtggca tccacaacaa aactaataaa 201240 
ctatgtcccc acccaaatct catcttgagt 201300 
cagggacagg tctttctcat gctgttctcg 201360 
ggtttcataa gggggagttt ccctgcacaa 201420 
atgtgccttt caccttccac tatgagtgtg 201480 
ccattaaacc tctttctttt gtaaattgcc 201540 
gaaaacagac taatgcattt ggaaaccaag 201600 
catttatagc accttggcat ttcagaaaat 201660 
ccctccttgc ccttctcccc tggggcaggt 201720 
caatacttgg aggaaaggaa catccttgtc 201780 
gaacaaacag gcctttctca gtgaccccag 201840 
ctaatcacct cctccaccac tatccactct 201900 
ttgcctgttt ctgtgggtct tcctttcctt 201960 
actaaatatg tgtgcctgtt ttcctcttgt 202020 
catgaaccta gcaatgggtg aggaaagaaa 202080 
gtgtccctac tcaaatctca tcttgaattg 202140 
gggaaccagt gggagataat cgaatcatgg 202200 
tagtgaataa gtctcatgag atctgatggt 202260 
ccatttttct ctcttgtctg ctgccatgta 202320 
tgaggcctcc ccaggcaggt ggaactgtga 2023 80 
atccagtctt gggtatgtct atatcagcag 202440 
ccggatttgt ttggcaataa agtgatccat 202500 
aggccagacc aaccatttgg aatagtgttt 202560 
caggggctga ggggagtgct cacctccagc 202620 
agcctcagca gaggaactgg agagcaaacc 202680 
gagggtggga gaggtcgcac ggcctgcccg 20274 0 
atgccaggcc tcacctgccc gtgcttaccg 202800 
aggtgttcac aggtgagtag gagaggaaag 202860 
agaaatgcgt tgagaattgt agcactgtat 202 920 
gtgtctttct aattgttaga cttgacactg 202980 
ttcaagcttg ttagaacagg gactcaggga 20304 0 
ggcagcagga ctcacttgct aagtgctcac 203100 
ggacccgctt gctcttctga gcttccgtga 203160 
tcgtctgggg agaattttgt cactcctgct 203220 
gaatgctcgg agttggggct gcagcaactg 203280 
ctagcagtca agtttgagaa ccacgggcat 203340 
attagcttca cctagggagc tttaacaatc 203400 
ccgtgttctg acttaattag cgtggggtgg 203460 
caggcgatga gaatgcacag ctaggatgag 203520 
atactgcaga gttgctgggg gtggccctgc 203580 
atgacattgt ggacctgctt tcgtgttatt 203640 
cagaccattt tttaaggatg aaatcaaagt 203700 
cattttatcg taatattgaa aaacgacagt 203760 
ctgctgacta gttcacgcag ttaccaccaa 203820 
tgtgcccact atcggggaca gaaacctacc 203880 
tgttgaaaac tatgaggcat gaaatccaga 203940 
ttcccaagac gattatgttc ccatcactta 204000 
atgctttaaa aaaatccaag tttggcgctc 204060 
ccttcagcga aagtcagggg gaaaaaatac 204120 
atggagaaca gaaagtccca tttgcatgtg 204180 
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gcttttggaa aagctaagcc gggagcgatt 
ataagaattt gaggaggatg tcccgggaga 
ctgccatgct ctttgacaac atcatagatt 
caatgccccc cgcccccccc ccccacacac 
999gtgcagg tcatggttgc cttattacac 
ctggataagt tctttggcag ttctctcaac 
agtattaaaa gcaaggccga ttatataaac 
tgcctgaggc ctttactgcc aagagccgta 
actggccttg agagtctcct tgctttgaca 
tgctcatgct tctggttctc ttaaatagat 
ccctcttctc ttttgcaaga gctgggatgc 
accttgattt tagacaaact ctaagttaca 
tttacttaac ctaatcaagc catgttgttg 
accttaaatc tcactacttg ttccaaccat 
gctgctcttt tcagcctttc tagttcgact 
tgtttccttt ctgcagtgac aaaattgccc 
ctttcttttt agctccctgg tttaaggctt 
attctgctta gaaagtacag acccctaagc 
tctgttatct ttactttccc agagcctggg 
tcttcataca ctcgagacaa cttcatgccc 
gatttttctt ccagaaaaaa aaaaaccttt 
gtgtttcata ttacaaaatt gaaacttggg 
aatccatgcg cttgtggctt tgcttgtcac 
cagctgcaag ctgcaggcag aactgttcct 
ctactgctac tgcttctact gttgctacac 
acactcacac acacacacac acactcagaa 
tttcatgcca aacaattctg cagttcactg 
ttgtggcacc gcctgcctgg agttagcagg 
ctacccatgc caattgcttg tcccagatcc 
gttgccacaa ccccctcctg ggatttgtaa 
aacacttaac attgaccaat tcatcacaaa 
cagagaagag atgcacaggg cccggggccg 
tctcaggggg catcacctcc tgcaccaggg 
taacgtcagg attttttttt attttttttt 
tggagtgcag tggcgccatc tcagctccct 
ctcctgcctc agcctcccca gtagctggga 
ttttttgtat ttttagcgga gacggggttt 
ctgacctcgt gatccaccca cctcggcctc 
ccgcgcccgg cctaacgtcg ggatttttaa 
aatcattggc cattgagtga accccagacc 
tccaaagatt gggcacgttc ctctggcact 
agcatacacg caggtagggt tggaaagggc 
catcgctcgg ggaattccaa gggtttaggg 
aaatacatat ttctcgttat agcacagtgt 
agctgcccca tcacattctg cctatttaag 
tctcctccac agccagccca cttcccgggc 
tgttctgagg ctttcatcct tctccacgtg 
agggctaccc ctttctccat tacctctgcg 
tgctcttcag caccggtgcc tgccccgctc 
cctgggcttc ctgaccccca ctgcgtccgg 
ctctgcccag ctggctggcc tgcctctgtt 
ggcgcctcct ccgctcacat cgccgcttca 
ctgctccttc tcccagctgg tggtgcctct 
ccgcaatttc gtgggtgtgg tggaagcaaa 
ccactctttc aagaagtttt gttacaaaaa 
tgcaggttgg gagaaataac agcatttgtg 
aacatgacaa acgggacaga aggaagacct 
catggggttg gcctgctggg ggatcggcct 
gggttctgag gaaggcaggc ctgaagcagg 
tgttactcca ctgtcctcag tggggagcca 
gaaggccagg cagacgggaa tggccaggca 
gtggatagaa acacggaggg ccacacggcc 
ttcacccgcg gcgatgccgg tgcagagaag 
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atcctgatgc gcttttactt tttgcataaa 204240 
gtgagccact tctcatttcc caggcctcgc 204300 
ttatttttgc cgggaatctc attatcaaag 204360 
agactgccag gtaaaccaca gagggtgagg 204420 
accctcctct gccatcacct ccttttttgt 204480 
ttttatttct gaaacatcct gaaacatctc 204540 
gatactccca ggcctgacaa cacatggttt 204600 
aggaccctct aagtcatgtt cgctattttt 204660 
tcctcttgtc tccattgtca gactgttaaa 204720 
gcagatgtgt ggggctgggt tgccactgag 2047 80 
agacagaagg cggtttggaa aacacgagcc 204840 
atcaggtgtc ttcatttatg acatttaact 2 04900 
gctactgatt agaatatcct tttataactt 204960 
cccaaagtct ggcgtcaact gtcattgcat 205020 
cttagcaaaa gccataatct tcctccagtc 205080 
agggaaagga aaaagaacag catctatctt 205140 
tcttttcccc catgatgaaa aactataatc 205200 
ccacttccaa aagaaggatg cattttcaag 205260 
ggtctcccag gccagaagtt gacagaactg 205320 
atttccttaa aactaagaac ataagacgct 205380 
cttgttcttt caagaactgt ttcacggaca 205440 
acttttgaac tgcaaattta gcagaaaatg 205500 
ctctactcag atgtctccca gacccctctc 205560 
ctaaaagaaa acaaactcct gtttttccta 205620 
acacacatac acacacactc tctcacacac 2 05680 
aacacttctg acaccaaatg tatgggtttt 205740 
cagacaccag ctgagtgtcc tacaatccaa 205800 
tgaaggactc agccccgcaa gcctgccccc 205860 
ccgttctaac tgaccagcgg taaatcaggg 205920 
cttgctacag cagctcacaa aactcagaga 2 05980 
cgatattttg aaaggatgtg aatgaacagc 206040 
gggagcaggg catacggagc tgccatgccc 206100 
tgtgttcaac cccaaagctc ctgaaccctt 206160 
aaagacatag tctcactctg tctcccaggc 206220 
gcaagctccg cctcccgggt tctcgccatt 206280 
ctacaggcgt ccgccaccac gcccggctaa 206340 
caccgtgtta gccaggatgg tctcggtctc 2064 00 
ccaaagtgct gggattacag gcgtgagcca 206460 
ggagcttcat tacataggca ggactgatga 206520 
ttgcgggggt ggggctgaaa gtttcaaccc 206580 
cggcccccag cctccaggag ccacctcatt 206640 
ttgtgataaa tgatgaagga cgttcttctg 206700 
gctcactgcc aggaacccgg ggcagaaacc 206760 
caccccctca ctctgcctaa tttggtgact 206820 
ccaagccccc cttccccaag gccaacctcc 206880 
gtgataactc ttctgcctca gctggagagt 206940 
ccgcctggca gtgctgctgc ctgtcttttg 207000 
acctggctag tccacatcct ccccgacccg 207060 
agtgcatgtc ctcatccctg cagcctccac 207120 
caccgctggt tgcgggcctg ctccggctct 207180 
ccgacctccc ctgcctggcc tggtgttctg 20724 0 
cctgcttttg ctatctgcac tttccatgtc 207300 
gagaagagga ctgagaaccg cctgtgaacc 207360 
ggcagagcgt gtgagtttag tgggcgtgcg 207420 
gatgcaaagg aagtgaagag ggaaggggtt 207480 
ttgtttgttg ttgtgacggt tttgagccaa 207540 
gatggagcgt gtccttgaga aggcgagagg 207600 
tccatatggg ggttcctctc cagcagcctg 207660 
tgccgggtgc cgggaagcag gagacatctc 207720 
cggctgagcg tgagaaaggg cttataggct 207780 
gaggagggga ggacgagccg ggtagaaaca 207840 
aacggtcagg ggactggcac accagccaga 207900 
ctcggcatct gaatttaacc cgggttgtgg 207960 
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tttgactcag tctgacgtgg agagaagggc 
tgctggttta ggggctggga catggagggg 
caggggcctg gggctgcaga caaggtggga 
ttctgcagtt ccccggggaa acaggagccc 
ggcacctgtg gatgttctga gacttccagg 
gacagagtca tcagggccga gaggaatgac 
gtctagcagc acggggtgtg gagctggctt 
gtggcagcca ggggcaggga tgcttttgat 
caggagcctc catgacttcc ccatctgaag 
tcaccagaat agttaatggt gtgctccgtt 
ggctgcagac atcgtttgta cttctccccg 
tgggctcagt gcaggccaca ttgtcacgtg 
ttagactcct aaaaatgttt tgtttttcca 
ttttgcaatg atcagcacta ggatatggtt 
caccaaagct tggccgctcg gacaggactc 
tcagatgcct gcctcctccc atgcgtggtg 
gccaggctgg cgcctcctcc cctcaccccc 
aatcttacca caactacgta ctgcctcctg 
ccctccctgc gtggaggcag gccctttgtc 
tctaagttat cagaggacat tagcaaacac 
agccagcctt atgaataatc aacgtgactt 
ctcagttgtt attgaagctg ggggcaaaaa 
ctcgacagct gagctttctt acacatgcct 
ctccccattt gcacacagga agccagtcac 
cattctgtct tgaatctccc tcccttcccc 
tttcttgggt cttcctgatc tcaacccctc 
gcctcctaac acacggatcc ccccatggcc 
aaccacgctc acactgcgta acacgaacgg 
tgtgaatgtt ttagcaagtt agctcttgca 
tcacacatgc cgtgagtctt cagacaccca 
tttccgtgag gaactgtctt tctgctcacg 
cctccctcaa tgccccaagc ctactcaggc 
acttctaggg tgtcctgaga agcaaagacc 
tagcctgata ctcacaatga aatgagttca 
acttctgcag ggggtgatgt ggggatgcgt 
ggggcaggag gcaggtgctg ggaggaggat 
gccaggatcc aagaaaaaat ggaaagtaga 
tcttttaaat gtgtgtacac agtacagttg 
acacacatat agattagcgt aaccactaat 
gcatgcatta aggaatgata yggcatattt 
gagtctattt aaaagagaca gtgggcacsg 
agagaacccc tgagtgctgg gctacaggat 
ttgtggattc tgaaatattt atttaatacc 
gatcgcataa ctctgactat actaagaacc 
agcgctttac cagttaggaa ggtttcgcgt 
ctgtaaggcc tgcgagtatt tcctggacca 
acagccctgt aagacacgtc ctgtatcgtt 
ttagcaggcc gaggagcgac tttaaagggt 
ctaattgaaa tccaacaccc tgggctggag 
catttcttgg tgatggataa atagctggga 
ctgttctgct tagctttgtc aacagtacag 
ttcggagagt gtgagctaaa acccattaaa 
aatgtggttt aaggcaaagc taatgtcctt 
cctcaccctc tgcaacccag ggctcctrgg 
ttctgccact gcctcctcaa ctacatgtat 
aacattttat gtcaccatta gcagaggaaa 
tcactctgac tactgagcag tcctgttgtt 
tccataagct ccacttaatc ccctcttctt 
ttctgtaggg gaacatttgt gcttcactgt 
gccaccttat ctgcttgcag ccttcattgc 
taaatagctg ctggcaatgc aaacgcgctt 
tagagggaat gcttcctacc ttgtaggaaa 
tatcagtatt cctaatcatt aaatgtgttc 
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cagggagtca cgggggggtg gtgggctgtg 208020 
tgaaggcggg agtcagtcgc atccgctggg 208080 
99tggcagct acggaggaag ctacaaggga 208140 
aagggaccgg ggggtgaggg ggttggaagg 208200 
aagtgggaca ggatcagtga tggagataga 208260 
agtaacagcg aggttgaagt gggcaccccc 208320 
gtggacggcc agggaacagg acgctttgag 208380 
cgccaaggga gaagacttga tgcagagttt 208440 
acctttttta ctttaatggg attgaagtga 208500 
cctatttctc tggtttttct aaggtccaca 208560 
gtgccaaaga ccagttaatg ccgactttga 208620 
taactctaca ctgagaatta ttttagaagg 208680 
aatggtggcc tctgggtctg acttcacctc 208740 
ttggagacgg ttgtgcagag ccagggcttt 208800 
acgatggaag acggtcaggt gccccaggtt 208860 
aggggcctgc ctcctttata gctttccgct 208920 
atctcctcca gaggaagacc aacttaatca 208980 
gaaaaagcct gatttctcgc cccctcttgt 209040 
cagtgcccat gtggcttggt gggtggtctt 209100 
acacgtccat tggcctaacg cccaatctgc 209160 
gtctctgtag ttcaatgcct atatctgcct 209220 
agatggatta ttcattggaa acctcaaaac 209280 
gtgtggcccc cgtggtatct tagtgttcac 209340 
attactggat tcctggtgag tttgactttt 209400 
aaccccatac cccaccctac tccatccctt 209460 
catctgtcct ccacgttgtc tgcatagtga 209520 
ttgtctgctc aggtttctaa ggtccccagt 209580 
tctggtccac acctcatcac ttggcgtgca 209640 
attattgcct gccgatcccc tgggctgcat 209700 
ggtctcagga cctgaggggc tcctgtgtgc 209760 
actccatgtc acatgccacc atcaggaagt 209820 
tcccactttc ctgcccatga aatgtgtgta 209880 
atgtccctgc atttttgcat cctcagaact 20994 0 
cttaacgaca caacgaacga atgtgcaggt 210000 
gcattgattc tgtggctcag ccctgagttg 210060 
tttatgtctt aggaagcaca ggaaggcctt 210120 
ycaatgtaag cgttaaaaga acacatttta 210180 
acttttttgt atacaattct atgagtttaa 210240 
tataagattg tagggaactg gggaaaaaat 210300 
gggggacaga gaacaggctt gatgaggaca 210360 
caattggagg ggaaggcggg gcagggtttt 210420 
tcagtaaagt tattgatgag attggctgca 2104 80 
tcgaggaggg tgtgagtaga ttgtgctgat 21054 0 
actgagttgc acccagagct tgcattactg 210600 
attccgtact ttaaatctaa ggtgacttga 210660 
ctcagaggaa gaatgctgtg aatgagaact 210720 
gttgagatgg gaaagtgcat cttaagacgg 210780 
gagctctgcc tagagggaaa agcgaatgca 210840 
taaatgaacc gtcagccacc catggggctt 210900 
ttccttgaag ctagaagcca tggggaaatt 210960 
tctgccttaa ctgacttgga ggtaaataga 211020 
tcaggtgaag acacaaaggc aagcacagcc 211080 
cggccttaac tgacggactt tcctagcagt 211140 
aggagctcat ggcagagaaa gccttctggc 211200 
acatcagtgt atatgcatgg gtatgaaatg 211260 
gctggaactc tttcaaaccc cacccaaaat 211320 
tattttggag gccacttaac cctggagcag 211380 
tcatgatttc ttttaaagag acatcttggg 211440 
aaaactccat ttgaggcctg ctcacggcct 211500 
ttgggagctg ttttacagct tcataagttg 211560 
gtctgtgggc aggaaatgaa ttctgtctgg 211620 
gccaatattt tttgtccatt agcaagttta 211680 
ttcggattgt cctttgaacc agttatagca 211740 
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tttgagttaa gtaaaatgaa tacactgttg 
ttttttggtg gggggggggt gttttttttg 
tctcgttctg tcgcccaggc tggagtacag 
cctcccaggt tcacaccatt cttctgcctc 
ttatctttta acatttttta gctagaaact 
gttatatctg aggttttcac tgaggtaaca 
ccatcgttcc ccaacttacg tctgtcccct 
gggtgtgtgc at^tgtgtgt gtgtgtaaag 
tgttgggggt ggggagttgt attgttttga 
agtgaagggt cacaatcaca gttcactgca 
ccacctcagc ctcccaagta gcggaaacta 
atctatgtct gtctgtctgt ctgtctatca 
ctatctatct atctatctat ctatctatct 
atggggtctc cctatgttgc ccaggctggt 
acctcagcct cccaaagtgc tgggattaca 
gagtttagag cacattgctg taaattgcga 
taataaacag caagttgact tcagaatttg 
ggtaacacac aggctccttg gcgagagcca 
catctaatat ttgcagctta gaattcacag 
aaatggtgct ttatttgctg gataggaaaa 
tcacagcaca aagaaactag ctactgaagt 
cttttctact gcattacaaa aaggtttatt 
tagatctcag tttttggtta agaacaagca 
gcattttaca ggatttttaa aaatacacag 
atggtgggga aggcaggggt gagaactgat 
ctatttccaa aaataggtgg attatttaaa 
agaaacaaaa acttaacaaa aaagtcactt 
gaaaagaaca aaataaagga tgatttcagt 
caggacccaa ggctttccgc cttcccactg 
ggggcttcag tgatccaggc gtcacattag 
tctgctttgc atctgtttat aacagtgaga 
attctcacgt tgcattggcc aggattgagg 
gtattgcgat caccgtgatt agctcagacc 
catggccagg tggagatcgg tgccccccag 
cacagacaac agtgtctaac aggaaaagcc 
tcaacaatca aatactttat tttattgttt 
ggagcgcagt ggcgccatct ctgctcactg 
tcctgcctca gcctcccgag tagctgggat 
ttttgtattt ttagtagaga tggggtttct 
gacctcaagt gatccactca cctcagcctc 
ctgcgcccag cctttttttt ctttagatag 
cagtggcaca atctcagctc actgcaagct 
ctcagcctcc cgagtagctg ggactacagg 
tatttttagt agagacaggt ttcaccatgt 
gtgacctgcc cgcctcagcc tcgcaaagtg 
ggccagatac tttcataatt aactttttga 
atactctttc ttgattccat ttccatgcag 
ttttcttgca gtgtgactca ccatttgcca 
tctctttgtc tgcaaattaa atcaatagcc 
gccaaaaata ctcacaaagt caccatccag 
tcctgcaagg cacatgaaag ctgctgaggc 
caaggggaga gaacaagcaa attccatgag 
gatgctggag ctatattttc acagtaggat 
aaacctagca gaagaatagt tcactgtttc 
atttactctt tgctgctctt cccccaacct 
tgcctgctgc ccctgactga gaggaccccg 
gagaaggtcc cttcatccct tccccgaaat 
ccattaattt cagatattgt aggaaaaata 
atggcacata acccatagca tctcgcaggg 
gtagttagga aaatgcgtgt tcagaataac 
agatgaccac acattgccag aatgagaagc 
gtttcccaac agctcattga aacaatctcg 
ttcgggaatg gctctaggaa ttctactttt 
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tttattttat acctgtatga aagttatggg 211800 
tttttttttt ttgttttttt tgaggtggaa 211860 
tggcgcaatc tcggctcact gcaagctcct 211920 
agcttcccaa aagttatgat ttttaaaaaa 211980 
tctgggtcaa tatataaata gatgagcctg 212040 
acaaaaataa aacaacacga tgccaccgag 212100 
ccacatgtcc tgcacacact cctgtttctg 212160 
gtttgcaatg aaattagaat cattggtttt 212220 
gacagggtct cgctctgtca cccacgctgg 212280 
gcctcaactt cctgggctca agtgatcctc 212340 
taggcatgtg acaccatgcc gggcttgctt 212400 
tccatctatc tatctatcta tctaatctat 212460 
atctatctat ctatctttct atctatctag 212520 
ctcaaactcc tgggcttaag caatccaact 212580 
ggtgttagcc actttgccca gctgaagtta 212640 
ttaccaaggg tattgaaaaa tccatgaaaa 212700 
tgcgtttgag gcttttcgcc ttgatctcca 212760 
gtggtgatac aatgagaaca ccgcctgctg 212820 
ctaacttttt aaaatgtacc agtgtggggg 212880 
ttggccaaga tcagaattct gaaggcagtg 212940 
cacatcctaa acattcgaga ggttgatttc 213000 
tactgcttat ccatatagtg agatagagat 213060 
ttatcataaa tgtgtgtgtg tgttgtgtgt 213120 
agaatttttc acagttgtta actctggtaa 213180 
ctattattca taatctcaat gatgaacaag 213240 
attattatta ttaggatatt ttgggcttct 213300 
aaagaattta ggggtctttt tttctgacat 213360 
ttggtccgtc agtgacttag aagtgttttt 213420 
ggccattttc agcgtgtccc gtggcctctg 213480 
acatgacagt gtccagcaaa gagaagtatt 213540 
aaaactcccc cagaatccca ccagcaattg 213600 
ccagctgtgc catgcttagc gcagtcattt 213660 
catcctggga cttctccttg ggcttgaaga 213720 
aagaagtctt tgttctgcca ataaagaaga 213780 
cctttttact ttataccctt ccgtattgct 213840 
gagacagagt cctgctgtgt cgcccaggct 213900 
ccacctccac ctcccagatt caagcgattc 213960 
tacaggcgcc taccaaaatg ctcggctagt 214020 
ccatgttggt cagaccggtc tcgaactcct 214080 
cccaagtgct gggattacag gcgtgagcca 214140 
agtgttgctc ttattgccca ggctggagtg 214200 
ccacctcccg ggttcacacc attctcctgc 214260 
tgcccaccac cacgcctggc taatcttttg 214320 
tagccaggat gatcttgatc tcctgacctt 214380 
ctgggattac aggtgtgagc caccgtgccc 21444 0 
atgtatgtgt gtcctacttt aaaatgaaag 214500 
cttggccccg tgatgctagg gaccatggct 214560 
aagcaaatct cttgccttgc atcagctcag 214620 
ctttccactg cctatctcgc aggatatagt 214680 
gaagaatcat ttgcccctgc tgccactgtc 214740 
tcggtattta ttatgctata aaattcaaca 214800 
catatataag tgtatcggat ctactccatt 214860 
cctcttttgt taaatattac agtagtagga 214920 
tctgattttg tgagtgatgt gggctgtgga 214980 
gcaccctacc cctgcctccg aggtcagcct 215040 
acgtcacccc accccaggtt atactcctct 215100 
acatcccctc aaatctctaa tttgtgtgaa 215160 
agcagggaaa atacgcaaaa caaaacgtgg 215220 
tgtgtacact gaagaagtct ttaccaaccc 215280 
tgggccttcc cgcggtcctc tgagtcaaac 215340 
agagcagctt cacatccctg cttctgaaat 215400 
agacacctct ctcccccaaa cccagcgtgt 215460 
gcattgcctc actctccctt tccccgtcca 215520 
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aaccatggta ttggatttac agcatttctt 
gcctggagcg cgctggattg aatgacgctc 
agaatcttgc cgtcacttgc acacgtcacc 
atggctctgc tgtgctaaca gttgcttttc 
aaaggcctac actttttttt ttctaatttg 
agaaaacgtc agttttatgt cattaatgtc 
attcacagga gcatggcagc cttacattca 
ctgcaacagt tcgcctacgc tatggagact 
tgtctgcccc acagctgtgc cgaagygagt 
ccctctttgc cgaccagcca gygatgtttg 
agctctgtga actagtccac ctgtgcggag 
gcatcgtcat cgggccctac agcggaaaga 
aatgggtctt aggtaagaat ccaggcacac 
caggtttcca gggagggcgg cktcaggctc 
ggttgatgtc tcagcctcca gcatctgccc 
cccgctcctt gctgttagca gacgtacagt 
tccccacttg acttccctgg ctcgtgtgaa 
cacccctgtg ggcacagact ctccgggcac 
cttgtcctgc ttcaggagtc cctggcagcg 
gtcatctcct ctcccaggta catctcatga 
gttaaaagac gtcaaacgac tccatctttt 
atgtcccact ctggcgttca tggagctgcg 
gaggtggtag gagctgagct gagatcggag 
cgggctccag gagacttgca ggtgatcccc 
aaggacggcc tggggaatgc ggaggaagca 
cctgtggtgg agtcatatgt ggcgggacaa 
accctcctcc atggggttgt gataaacatg 
ctacaggctg ggtggcttca aacaacacgc 
taagatgggg tatcggcagc gttggtttcc 
gctgccttct tcctgtgacc tcacgtggcc 
tctgtgtgtg tccaaatgtt ctcttctcta 
cccaatggca tacttttatt tgcttttatt 
acccaggatg gagtgcagta gcatgatcac 
aagtgattct cctgcctcag cctcccaagt 
ccagcttttc tttctttttt tttttttttt 
caggatagtc tcaaactccg gggctcaagc 
gggattacag gtgtgagcca ctgcacccag 
tcaaggcccc atctccaaat acagtctcat 
tacgaatttt gggcagacac aattcagccc 
tggggccaag atccttaccc gactttagag 
tctctctctc ccgttctctc attctttttc 
tcctattcag tctcctttct tagtactttt 
ttctcatccg ctgctcaaca ttatccctta 
ttacattcgt atctaactac ggacatttta 
ctttcatata ttttagaagt gtggcaatca 
ttcatctttt gcctggaaac caacttccaa 
ccaaaagcta agaggctgtt tactcttttc 
cctatgttct gaaacagagg ttgttgtttt 
ccacggactg cagaacagaa ctgggcctga 
gcacacgatt gatatccaca gtgcatatca 
tgtgccgtgc agtgcccgag cctgcctctg 
cagccccttc ctgtgggtcc tgcgtccttg 
acttcctccg gttgttttgc cgctcggctc 
tctacctgct taatcctgag gcttcgatcc 
aggccctggc cacaggcccc agcctctttt 
tgatttctca ccaattatgc catctgcctg 
gctgcagacc tcacccatgt gagacaggtc 
tagtgggcgt ctccagcgta gtgggcgtct 
acctctccag tgtaccaggc ctctccagcc 
tctcaagtat ttattggctt gtatttttct 
gcagctatgt acgaataaag aagggtttat 
tcaccctctg ctaaacatta tcctttaata 
tgtgaattga attccaatct ggtagctgcc 
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acatcctata aaagtccttt tctgccaaga 215580 
tcccagcaca gccggcattt gcagtgcatt 215640 
aagttacttt agtgagagtt cagcctagct 215700 
aatattttgt ttgaggcttt ggaataattc 215760 
tttccttgga gttttacgca tggctacttc 215820 
atcatcttct ctggattctc agaattcaaa 215880 
gtctattctt ttcataaaaa aggaagtaaa 215940 
ggagtggtcc cacctctgta attctrtcts 216000 
gccacttgtc tgcagggccg taccgcggaa 216060 
tctcgcctgc cagcagcccc ccagtggcca 216120 
gccgggtcag ccaagtcccc cgccaggcca 216180 
agaaagcmac agtcaagtat ctgtctgaga 216240 
agacgctgtg gtgtggtcca gatctgtgga 216300 
acaccccctt ccacgcagct ggggcacctg 216360 
tggcagcgtc gtgtggtcac cctcggcatt 216420 
tcacgaggaa atgggaactc tgactggact 2164 80 
aaatccaggc tacccaaagc caccccrggc 216540 
ccctcttaga ccctccctcc ccagtgcctc 216600 
cccggcactg gggcccaagc ccccgtccct 216660 
tcactccgtc tgctcatgtg ctcaaagggt 216720 
atttgacaaa gtgagcacag tgtgaccgta 216780 
ccaggcgccg tgtgcgattc tggggaggaa 216840 
gaggctggaa ccccacgccg tgctaacaca 216900 
ggagaagagg gttaaggaag agtgtgaagc 216960 
ggggcagcgt ctgtgctaga aattacctgc 217020 
gcctagggct ccactgtggg gaaatcccac 217080 
ttagtttgct tgggctgcca tcgcaaaata 217140 
attgtctctc agttctggag gctggaagtc 217200 
cctgaggcct ctctcctggg cttgcagaca 217260 
tttcctccat gcacacacat ccctggtatc 217320 
aggataccag tcagattgga ttagggctca 217380 
tatttttttg aaacagtgtc tcgctctgtc 217440 
agcttactgc agcctcagcc tctctggctg 217500 
agctggaact acaggtgcac accacgatgc 217560 
tttgtagaga tggggtctcc ctatgttgcc 217620 
gatcctcctg ctttggcttc ccaaagtgct 217680 
ccccagtggc atcattttaa cttgtctttt 217740 
cctgagttac tgagggttaa gacatcgaca 217800 
ataacaatga atcactctag tttcagcccc 217860 
gtacatcccc tctctctctc tcaatctctc 217920 
tctctctttg cttccatctc cttccatgtt 217980 
gcatgtctct aaatcctaaa cttctggctt 218040 
atagacaagt agatactgtg tttgttcaag 218100 
caagtatctt ttacatgact gatggtcatc 218160 
aaagtaattt tttactctgg tgcagagtaa 218220 
aaaaaaaaaa actatgattt tagtcacagt 218280 
taaatgccaa gaatataacc ttcaaaacat 218340 
gtttttctgg agaagtgtat tatcaaaatg 218400 
aagcatgtct gggccagctg acggaactgt 218460 
acaggcagtc tttttggagt ttgcaaagcg 218520 
cactcgtgtt tccaggttgg gtggctctga 218580 
tgtggagtca cgcttgctcg gcagctgctc 218640 
tcccgcccgt gggttttcag gaggcgaatg 218700 
cgcaaagccc ttcagagttc tctgacttcc 218760 
tctttcctcc tgtaacttgt gtcctgtttc 218820 
tgcccttggt aacatctggg tattgtgtgt 218880 
ccctcactcg ccggccacca gaccccagtg 218940 
ccagtgtagt gggcatctcc agtgtagtag 219000 
cacactctct gagatgtaag atcacgtagt 219060 
ctttgtgaag tgaattccaa tctagtagct 219120 
ttttctgtcc gtacatactt ctggcttttc 219180 
gacaagtaga tttttttgta tttttctctt 219240 
gctatgtaca aataaaggaa ggtttatttt 219300 
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tctgtccata catacacacg taaacctaca 
gcctcatcca ggtccaggct atttgcttat 
tttttttttt ttctgagatg gagtctcgct 
gatctcagct cactgcaacc ttcacctcct 
cccagtagct ggaattacag gccccgccac 
gagatgaggt ttcaccatgt tggccaggct 
cctgcctcgg cctcccaaag tgctgggatt 
tcacctattt tctgtggaat gcatttactt 
ttgcttaccc cacatgctgg ttaaaggagg 
ggcgtaggct tcttaagtgt ggcagattga 
tgccccttcg acaaagcaca ttgtgtcttt 
attataacaa atgcttctct ggacaatgtt 
ctaggaatat atcaaaccat tttaaagcac 
taaattatga aaagacaata ctcaaaaaaa 
caactgcttt gtaaggtagg gtccctgagc 
gcccatgcct gttgtcttag ctacgtggga 
cggtgagcaa cgatcccacc actgtactcc 
taaaaagaaa aaaaaaagaa tcatttttca 
gtcttgtttt gcagatgtcg taaactcaca 
ctaggaacct ctctgagaag tttcttttct 
tggccagagg agggaaagga aggtgggtac 
gcatccgagc accacagtcc acccgccagc 
ccaacaactc catatctatg acgataaaaa 
tctttcgacc tcagctctga ggtgaccctc 
tgctggcata gaacagggag tggaggtgtg 
tgtaagccac gcctcactca cttgctccct 
taacggggtt atcggaaagg gcatgattac 
ccctgaaact gtattgtact tgggccaaga 
aatggggttg ttgtttggtt tttttgtttt 
gcacacttgt gggtggttga acatggataa 
cggccctagg aaataaaatg ttacctttac 
agcatttgct aatggttgca ttttcccccc 
ataagtcacg aaacgaagac cctggggtct 
gcccttgggt ggagctctga gccctggcgc 
tggtcagctc tgtgcactct tccctccctg 
tttctcatag gcgtatttcc actctcaggc 
gataagacat acatttatgc tattgtggga 
aaacaaaaaa aaagtttagc ctctgcctga 
ttgaacatgt cccatgtcga tgttttcagg 
atgctggcmt ggctcattca tcgtttcccc 
cgctcccatg gggccactcg ggaggcctca 
tgaggcttga tcagtcaccg cagtccacat 
gagaggggag gccatctgca gaagctgtgg 
actgccccct tccagcccct ctcctccaag 
ccacacccag gagctcagca accgctcaga 
agacaatatg aagaattatt tttcctttga 
catataaatg aatgagaact aaaactggtg 
taggctttta aatatatatc tcagccaggt 
ttaggaggcc gaggcgggtg ggtcgtttga 
atggtagaat ctcgtctgta caaaaaagta 
tatagttgca gctacatgag aggctaaggt 
tgctgcagtg agctgtgatc atgccattgc 
gtctaaaaat atatatgtgt ttatatatat 
aaaaattaac taactgctag ctcctaaaac 
gctcagaaaa tgaatttcta agcactccct 
ctggatggtt atctttgaaa cttcctaacc 
acagaactaa actcactgga tgtgaattgc 
aaggctgggc gcagtggctc actcccgtca 
gatcatgagg tcaggagatc gagaccatcc 
aaaaatacag aaaaattagc cggtcgtggt 
aggctgaggc aggagaatgc atgaacccgg 
gccactgcac tccagcctgg gcaacagagt 
aaaaaaacat taaaagcaga ccaagaaaat 
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gaacacacag tccagggcat tgcgtttcct 219360 
tctctaacca gaaacaaatc atatactttt 219420 
gtgtcaccag gctggagtgt gcagtgatga 219480 
gggttcaagt gattcttctg cctcagcctt 219540 
catgcccagg taatttttgt atttttagta 219600 
ggtctcaaac ccccaacctc aagtgatcct 219660 
acaggcgtga gccaccgtgc ctggccgaaa 219720 
catgtataaa acagagtcat agcctccacc 219780 
aaacacagag agcgcaaatg ccctgtggca 219840 
cggtatccat ggatgtgtcc tcatcatccc 219900 
tggagacttt ttttcctccc gttcatttcc 219960 
tcattctcaa aatatcgcaa tattgaaaaa 220020 
caaatcgaaa aagaagttat tttgtttaaa 220080 
aatcaattaa atttattcaa actggaatat 220140 
gtcttagagt aatttgagcc gggcgtggtg 220200 
gcttggcttg agcccataag ttcaaggctg 220260 
agcctaggca acagagcaag accccatctc 220320 
gtgcctttat attgtttctg tatcttaaca 220380 
gggggtggag aaccaggagt tttttagcca 220440 
tttcctttct ttattattat tagtattttg 220500 
tgaaacgaca gctcttcccc tgggactgca 220560 
ctttgttcct gcacagtctg cctctcaaga 220620 
ttgttagtga ttattttact tgtaagaatt 220680 
agctcgcccg ccaccccagc tgccccacct 22074 0 
aagtcactca acagggctca gtatacaaaa 220800 
ggagaatttc atctgcgccg cgttgcctaa 220860 
gttccctctt cattccctgg agtctttttt 220920 
ttcttgatga atcattcaac cagaaggaga 220980 
gttttttttt tttttttgcg ttttgagaga 221040 
aaataaacgg gaaaacaaaa atcaaattcc 221100 
ctgatattga taatacatat tatatttgaa 221160 
aacactccca tgacatataa ttcccatttt 221220 
gaaggaactt ggctggggtg aggatcacaa 221280 
ggtcctcaag ggtctgcgac atttgtgctg 22134 0 
ctgctgttat cacgaaaggc tggcttggcc 221400 
gcccttttat tgtctgggct ccattcaagt 221460 
acataatgta atattctcaa cagcattgcc 221520 
ttttcttata acttataaag aaaatttggt 221580 
aaaaagatcc gatagcatgc aggccttctc 221640 
taatgactga ctgaccagaa aaatgcacga 221700 
ggcttcgggc ttcctgattc agtagatatg 221760 
ctccattgcc tcgataagga accagtcgca 221820 
agagtggcag agaggaragt gaggacgggg 221880 
gacggcctca ttttatcccc acccaggttt 221940 
aaatgtttgt agaattcaaa gacataattc 222000 
gttgttctta aaacagacga aatctaccag 222060 
ggatttggta atgtcgacat ctgagatgtt 222120 
gcggtggccc atgcctataa tcccagcact 222180 
gcccagcagc tcgagtccag cctgggcaac 222240 
caataattag cgggcatggt ggtgcaagcc 222300 
gggaggatca cctgagctca gggaggtcag 222360 
actccagcct gtgcgacaga gtgagaacct 222420 
atatttatat aaacattagt gggttttaaa 222480 
agtattttgc cattagcttt ggaaaggttt 222540 
tcattgcatt tattggtcaa actaatggtc 222600 
tgttgggtcc ccgtcgttaa acttatgcca 222660 
atcagagatg taaacattta aaagcgtatt 222720 
tcccagcact ttgggaggcc gaagcgggcg 222780 
tggctaacac agtgaaaccc cgtctctatt 222840 
ggcaggtgcc tgtagtccca gctactcagg 222900 
gaggcagagc ttgcagtgag ccgagatcac 222960 
aagactctgt ctcaaaaaaa aaaaaaaaaa 223020 
cctagaatac aggagtcagc tgtctattca 223080 
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attcagaata agaaatattg tagacaaggc 
ttggtttgag aagtgaaacc agccatgtat 
gaaactttga agactatttt gctgtacaaa 
cttggggtga atgcccaagt gtgtcacagc 
aatggacata aaagaaactt caaagctcaa 
ggctgtcttt gcttactgac cgacttaatt 
tttcctgtgt ctttggagta tgctgaactt 
gtcagtgttg aaactaaagt gacctaaagt 
tctccccaga attcataggt tgaagccttc 
ctagggcctt taaagacata aataaggtaa 
aatatgactg gtgtccttat acgaagagga 
atcccagcac tttgggaggc cagggccggc 
gtgtgtccaa catggtgaaa ccccgtctct 
tgtgggcacc tgcaatccca gctacttggg 
agaggcggag gctgcagtta gtcgtgatcc 
caaaacccca tctcaaaaaa ataaaaataa 
atttttgcac agagaagagt ccaagtgagg 
gagcagtctc ccaggaagcc tcaggagaaa 
tcctgccctc cagaactgtg aaaaaataca 
cattttgtta tggtagcctg agcaaactag 
gcagaaatct gcttttagac agcaggaaac 
atgcagtgaa ttactgaaag acccagaacc 
atcttccttg gtaagaagca ggatcttagg 
atttacacag ccactgactg ttgtggtctc 
aacttcccca gtttggagca agagaaaaaa 
tttagtcaat ttttcttaaa tgttggtgct 
ctgaagaaca caggaggaaa gaaataaaag 
caaggcaaat aagaggctca gcaatagcaa 
ataaatatta aaaatcctga caatgttgaa 
aatactaaga atgaaattgg agctgtcact 
aagagagcat gatgaacaat ttaataccaa 
tttagaaaaa tgttaccaaa attgattcaa 
taaaaaaatt aaataggtaa aatatgtaca 
tcccaacact ttgggaggct gaagtggaca 
cctgaccaac acggtgaaac cctgtcttta 
gggcatgcct gtgatcccag ctacttggga 
gaggtggagg ttgcagtgag ccgagactgt 
caaaactctg tcctaaaaaa aaaaaacaaa 
gaaaatttta aaaagtaaca atttgaaaaa 
accagccact aaaaaggcac ctgtacatgg 
tctaacctct catccaacac agaaaccgct 
ttgcaggtgc tctaaaaggt gctctaaaca 
tgcctgatag aggaaagcca tcttcaagcc 
acccagttcc cagttcccag ttcccttcct 
tgttctagtc tagctgattc atacctgaag 
tgcaaggtgg gcaagaaaaa gaggtgggca 
aggtaaaaat agataaattg cactatatac 
cagtcaacag agtgaaaagc aatctatgga 
aatcttcaca atatataaag aactcctgca 
aactgagcaa agaacttgaa taaacatttc 
gcaaatgaaa agatgcttaa cattactaat 
gagatagcac ctcagcacct cacacccatt 
ccagaaaata acaagtgtta gtaaggatgt 
taatgttggg aatgtaagat attgtagcca 
aaatgaaaag tagaattact gtatgatcca 
aattgaaagc aggatctcaa aagaataatt 
caatagccaa aaggcagaag cccaagtgtt 
gtctatccat acagtggaat attattcacc 
aacactgtgg atgaactttg aaaacatcat 
caaatatcat acgactacac ttataagagg 
actatagttg aattaccaag ggtggagtag 
tggctacaga gactcagttt tggataatga 
ctgcacagca ttgcgaatgt acttcatgcc 
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aacattttat gtgtattaga aatgtggtgg 223140 
atgctgctcc aagcattttg gttgtggcag 223200 
ttcacaaagc cccctgcaaa cactcccgtg 223260 
tgccttgcag ctctgaggat cagaaaggtt 223320 
cctcctaatg ggaagctgcc cttggtttta 223380 
catgctttgg gttatgactg taggagagat 223440 
gtgtttcttt ttgttgttgc atattagaca 223500 
gacagagctc atgttatggg ctgaattttg 223560 
ccagtcctta gaacatgatt gtatctggag 223620 
catgaggtca taagggcaag gccctaatcc 223680 
agaggccagg cgtggtggct tacgcctata 223740 
agatcacttg aggtcaggag tttgggacca 223800 
actaaaaatg caaaattagc tgggcatggt 223860 
aggctgaggc aggagaatcc cttgaacaca 223920 
caccactgca ctccaacctg tgcaacagag 223980 
aataaaggaa gacaaagaaa caccaaagat 224 04 0 
actcagggag aaggtggcca tctgcaaccc 224100 
ctaacccctg tgacaccttg gtcttggact 224160 
tgtctgctgt ttaagccacc caccctgtgg 224220 
ttcagcccaa aatgaattct gatatcacct 224280 
tgagggcctc tgagtttcta ggccagagtc 224340 
ccagtcctgg cccctgattt tcagtttaga 224400 
ctgggcccag caagtggaaa actctttttt 224460 
agactgtacc acagaacctg gtgttccaca 224520 
gtagttggat gaaatgatct cattttattt 224580 
tgaaaacaaa tggatggcag taaagtaatc 224640 
aggcaatacc aaatgttagc aaaatggcag 224700 
aaaactgagt tctttggctg ggaaaaactt 224760 
aaagaaaggc agagataggg ttccaggaga 224 820 
gcagttatcg taaggatatt ttaaaatcat 224 880 
taaatttgaa aacaggtaag atggatgatt 224 940 
gaaatagaaa atctaaacaa gctcaagcgt 225000 
tcaactgggc acagtggctc acgcctgtaa 225060 
gatcacttga ggtcaggaac tagagaccag 225120 
ctaaaaatac aaaatgagcc aggcatgatg 225180 
ggctgaggca ggagaatcgc ttgaacctgg 225240 
gccattgcac tccagcctgg gcaactagag 225300 
aaaaaaacaa ttatatatca acaaaaaaaa 225360 
gtcaaatagg caatcaaaag tattcctttc 225420 
gaatggtagc aaaatgacag aagaggaaac 225480 
aaaaccaggc agaagctgtc tgcagagatg 225540 
accaccaaat gcatacagca accaggcaaa 225600 
cgcaggaaag ttttstggca catggtggca 225660 
caagctgcag ggagcagacc agacatgatt 225720 
gattgatcct catctccatc tcacataaca 225780 
cagctcatga aagccacaga gaggcaatta 225840 
aaattaaaga cttcagtgca tcaaaggata 225900 
ataggagaaa atatttgcaa ataacgggtt 225960 
actcaacaac aaaaaaaaac cccagtttca 226020 
ttcaaaaaag atgatataaa tgtccaatag 226080 
ccttaggaag atgcaaatca aaaccacaat 226140 
atgattgcta ctataaaaaa aaaaaaaaac 226200 
ggaaaattgg aaccttgtgt ctgcctcatg 226260 
cgatagaaaa cagtgtggca gttcatcaaa 226320 
acaattcctc ttctgggtat atgccaaaaa 226380 
gtacatccac atttatagca gcattgttca 226440 
catcagtgga tgcataagaa acaaaatgtg 226500 
cttaaaaagg aaggagattc tgatacatgt 226560 
gttaagtgaa ataagccaga aaccaaagga 226620 
aacttagaat agacaaagtc acagagacaa 226680 
gcaggaaggg agtggagaat tattgtttaa 226740 
gaacattcta gaaattaata gtagtgatgg 226800 
actgaagtgg acacttaaaa atagctaata 226860 
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tggtaaattt tatgttatgt ctatcaaact 
tagtaagttt taccaaacat tataaagttt 
cattttacaa ggctacattg atcttgacct 
aagtacataa aaatctgagg ctgagcgcag 
aaggccgagg ggggcggatc acaaggtcag 
aaaccccatc tctactaaaa atacaaaaaa 
tcccagctac tcgggaggct gaggcaggag 
agtgagccaa gagtgcgcca ttgcactcca 
gaagaagaaa agaaaaaaaa actcagaaat 
gtgtttttaa aacacacaca cacataacca 
aatgaatagc attaagtctt cttttttcta 
gcagtaggga agattcaatg ccgagtaatg 
aggaatagat aactttctta actatggagg 
aatggtgaaa accttagttt aaggcttaaa 
ctcttcaaca gtgttctatg attcctagtc 
ttatacagga agggacacat tttgtttgca 
aagagagtca ataaactgtt acaactcatt 
actgaaatct ttagcatttg tatacccaat 
atagtaaata atagtggatt caaggctagc 
taatctgaaa tgctccgata tcttaaactt 
aaattccaca cctgacctca tgtgacaggg 
tgatttagcg tccccaaggg aaaaaaaaga 
cttttccacg cacacccaaa ttcccccaca 
cacgtgtgca ggctggacgc acccaacgca 
gaactccatg cattactcac tgtggttttt 
attactgaaa atgtcgaaaa ggcctgcaga 
gaggaataat ttatgtttat caatagcaca 
cagtataagt gtgaagcgtc ttacagaaga 
cctgaagaaa cagaaggata cgcttttgaa 
aatgaaaaat agaaaaactc tacgtaaagc 
aaactagatc tgaaggcatc acactgaacc 
acaagcgaag atctatcctg atgaactgaa 
ctggttgcag aaatttaaga aatgacatgg 
aaggcagcgt cgaaactcat tgacgagttt 
ccagaacaag tctgtattgc tgatgagaca 
atgctgacta cagctgacgg gacagcccct 
actgcagtgc tgtgcaaatg cagcaggcac 
aagcttttgt ccgtgctgtt ttcaaagagt 
caaaaaggca tagatcacca gggacatctt 
ggcctcttgt gctcgctgca gaaaagttgg 
ccttgactac tgttctgctc atcctccagc 
tgtgtacttt cccccaaacg tgacttcatt 
tcaatgraaa gtaaatwtaa aaacactgtc 
ggtgtaggtg tagaagattt tcaggagctg 
aacacttgca acacagtgac taaagacaca 
acgactgtgt tcagtgatga tgatgaacca 
agtgagaaga aaaggatgtc tgacctccaa 
gggaagaagt acacattaat gtcattttta 
tcattgactg ttggggaaat agccagaatg 
accatgaaga tgacgttaac actgcagaaa 
gtgatgggtt aactgaggcc cagagcagcg 
agcttataaa atcaaagaaa gaatcctaag 
catggtgaca cgtgactata gtcccagctg 
agcccaggag ttagaggctg cagtgagccg 
acagcgaggc cctgtctcaa aaaaacccaa 
aaactttgtg tacactgaac caacagaaag 
gtgcagatac cattaaaaag ccccccagca 
tcctgggcct gtaactgctt cttatgttcc 
cgtaagcttt gaatcaaagc acagcatggt 
gttgctgctg ttgttcagca gctgattgcg 
cccctgagca cgtaagtctt cactgtgtta 
cttatgtgtg aataagtgta aggaaatgac 
acgggcaggc acggtggctc acgcctgtaa 
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tttaaaggca ccctccacag atagttttag 226920 
tacaggaaaa aaaaagaaat ctattcacct 226980 
aatactggtt taaaaaactc atttgtaaac 22704 0 
tgactcatgc ctgtaatccc aacactttgg 227100 
gagatcgaga ccatcctggc taacacagtg 227160 
ttagccgggc gtggtggcat gtgcctgtag 227220 
aatcacttaa acctgggaga aagaggttgc 227280 
gcctaggcaa cagagtgaga ctctgtctaa 227340 
aagatatttc atcaagtcaa atttggtagt 227400 
agtgtggttt aacctaagaa tgaaaggata 227460 
atccattaat tttcttagta gtgttaaaaa 227520 
atttaaaaaa aaaaaaactc ttcagaaacc 227580 
ttatctataa aaaacgtaca acaaatattg 227640 
tcaggtacaa gacacacatg aatgctatta 227700 
aagggaataa aataaaaaaa attacaagaa 227760 
tgtcatacag ttgtctacat agaaacatca 227820 
cagcaaaatt cctctttgta agatccactc 227880 
gataaacaat tataaaatgt aacagaaaac 22794 0 
catgtaatac agattgaaca ttcctaattt 228000 
tttgagtgcc aacctgtcaa cacaagtgga 228060 
catagtcaaa gcacaggtgc acgacacagt 228120 
cccacccagc ccccttcaac tatagtataa 228180 
caagcacgcc cacaatgtgt aataaaatgg 228240 
gattccccac gatacctcac gtggggccga 228300 
tgcttattct ctgcagtgtc atgtaaaaat 228360 
tccccctatg tgtaacagtg atcagaaaaa 228420 
aacagtcaac ttgttggagg aactgaacag 228480 
gtatggtgtt gggatgacca ccatacatga 228540 
gttctatgct gaatgtgatg agcagaagtt 228600 
taaaaatgaa gatgtgaata gtgtattgaa 228660 
cgtgccactc agtggtaggc tgatcatgaa 22 8720 
aattgaaggg aactgtgaat attcaacagg 228780 
aattcaagtt ttaaagcatc tgcagatcac 228840 
gccaagatta tcgctaatga aaatctgatg 228900 
tgaccatttg ggtgctactg ccccagaaag 228960 
acaggaatta aggatgccaa ggacagaatg 229020 
gcataagtgt aaacctgctc tcatgggcaa 229080 
aaatttctta ccagtccatt attatgctaa 229140 
ttctgatcgg ttttacaaac acttcgtaca 229200 
accggatgat gacagcaaga ttttcttatg 22 9260 
tgaaattctc atcaaagata atattgatgc 229320 
agttgagcct gtaaccaggg tatctttaga 229380 
ttgaattgca cgctcgcagc agtgaacgga 229440 
agcatgaagg atgccataca tgctgttgcc 229500 
gatgtgcgtg cctggcgtga cctctggcct 229560 
ggtggtggtt tagaagaatt cagcttgtca 229620 
aaaatatacc ttcagagttc atcagtcagc 229680 
acattgataa tgaggctccg gttgttcatt 229740 
gttctgaatc aaggtgatcg tgatgatacg 229800 
aagcacccgt ggacagcgtg gagctcaggt 229860 
tgcattcaca acagaacaag caatcatgtc 229920 
acaaaaaaga aagaaaaaaa attagccggg 229980 
tgtgggaggc tgaggtaaga gtcttgcctg 230040 
tgatcatgcc actgcacacc agcctgggaa 230100 
aaaactaagt aaatattttg tacatgaaac 230160 
cagctgtcgg ttctgagacc attgttagtg 230220 
gaatgcctcc tcgtccccag aggacccact 230280 
ttctcaccta aaatgtaaaa tgccgtgtcc 230340 
tgggagagca gaggcctgct gttgtttgtt 2304 00 
gtctctgctg atgccactgg ctgcttagct 230460 
atggcatgtc ttatttttta ctgtgaagta 230520 
tgcttggtag tagcatataa attcagagtc 230580 
tcccagcact ttgggaggcc aaggcgggca 23 0640 
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gatcgcttga ggccaggagt tcaacaccag 
ctaaaaatta caaaaattag ccgggcgtga 
gaggctgagg caggggaatc gcttgagcct 
tcaccactgc actccagcct gggtgacaga 
tcacagtcag gaatgagggt gatgccacac 
ctgagatagt gatacctctg ctttctgatg 
caaaatttgt ttgtttattt tttgaaacag 
agtgctgcga tcatagctca ttgcagcctt 
tcagcctccg ttgtagctgg gactacagtc 
tatgcacaga actatttaaa atactgtata 
cagatgaaac ataaatgaat tttggtttta 
tgtccattcc aaaaatgcca cccaccaccc 
ctggtctcca gcattttgga taagggacac 
tggatgggaa acagaagttg gtgtggtagg 
tgtgacacat tcttaggatg tcctagagaa 
tatcagagca gaaagaatat gttgagaact 
aaacaacaca tggtatcagg tcattcattc 
gtcatatgcc gagcatatcc taggcactgc 
tttgcccttg tgggacttaa catttaagag 
aatccttctg gtggtaaatg caatgaagaa 
gtaggcccct tgcacgtgag tggcatttga 
gactcttgta ggtctattgg actggccctt 
gaagcggtaa gtttagcgtg gctgaaatga 
agctggagaa gcaggcagct tcagacagga 
aacagcaaga agtttgggtt ctgttctaag 
tgggaggatt aggctgcggt atatgtttgt 
tggatacaga gtctcccaat gttgcccagg 
ctcctaactc ggcctctcaa agtgctgaca 
gtctcggacc tttgcagtgg ttcaggtgat 
cgatggaggt gttgtgaagt tgtcatattt 
gctgatacat tggatgtggc atattagaga 
tgttctgagc ttccagaatg aggcatctag 
gcagtggttt attcagtctg cagaagcagt 
catgtgatat gagcagatga aaatcacaca 
gactctagga atatttgtaa cctgctgggc 
agccgcagag gctgggtgac caccgtccca 
agtggagatg tggctagatc tcctagccct 
tgaagctcac atgggtcccg tgtgctcttg 
ttggatgctg gcaatcgcag ttttagttaa 
aattaatttt ggtagcatgt tgattctgta 
ctagtaaact ttagcttgtg cgtaaatgcc 
atttaaaaaa aaaaaaaaaa tttttttttt 
agttcagctt cctattcatc tctgtctcca 
gtctacggcc acattttatg ggatgtttga 
tggggatgtg gctatgttca gatgccaaat 
gtgtttacag ttaggaacgt ggggctgtga 
cggcccttgt tcagtggtca gatgcgcttc 
gtcacagccc catgcgcacc tcaacgccac 
agcatgtcct agagcaggcc atgagaggtg 
gtaggcttct gttccatctt gtctctgttt 
ccaccacgat aaatacgtgg atggaaggat 
ttggatggat gggtagacgg gtgcatgggt 
ttgatttcat agtcaaagaa ctcaaacagt 
aacccttcct taactacaat aaagatagaa 
atttatgaat gtaaacatat tatggtcagg 
ttgggaagcc aaggtgagtg gactgcttga 
atagaaagac cgtgtcccta caaaaaaaat 
cctgtagtcc tagctactca ggacgtcaag 
aggctgcagt aagccatgat cataccactg 
tgtctcttta aaaaaaaaaa aaaacatatt 
tattgtttag ggatttgttt ttatatatat 
tatttaaccc tatgcctaga agatccttcc 
ttttaaccat tgggaggtac tgtaattaat 
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cctggccaac atggcaaaac cccatctcta 230700 
tggcacatgc atgtagtccc agctacttgg 230760 
gggaggtaga gattgcagtg agccaagatt 230820 
gagactgtct caaaaaaaaa aaaaaaaaag 230880 
aaccactgat tgtccacatg ggggtgaggg 23094 0 
gttccatgta cacagacttt gtttcatgca 231000 
agtttggctc tgttgcccag gctggtgtac 231060 
taactcctgg cctcaagcga tcttcccacc 231120 
atgctgtcgc acctggcaat cacaccagtc 231180 
aaattacctc taggctatgt gtataagatg 231240 
gactctggtc ctatcttcaa gatctctcat 231300 
cccaaaaaaa atctggaatt caaaacattt 231360 
accacctgta atatcctttt acacatttcc 231420 
agtcacacat aaacggcaga ctttcttgtc 231480 
gtatcagcga tgtgaatgtc tccagtcaaa 231540 
gctgtattat tagactgggc tactttcttc 231600 
atttacccag tagatatttc ctacacactt 231660 
aggtacagca actgacagga atatacagcc 231720 
agaagacagg cagcaaacaa tttctttaaa 231780 
aacagggtga gtatagagag gaggagtgag 231840 
gctgaggccc agatgatgaa gagaaggatg 231900 
ccaggaatgg taagggctga gaggtcagga 231960 
agggagagaa gacaaagcaa taggaaatga 232020 
ccattccaga ccactgacac cttaacagac 232080 
gataaatgga agtcacagaa cgattttaag 232140 
ttactctgtt tgtgtttatt tttgttttaa 232200 
ctggttttga actcctgggc tcaacggatc 232260 
catgtttttt taatggaagc agagaaagca 232320 
tagtgatggg ggttaggacc agggacgtat 232380 
taaatataca tttcagagcc aggtgcactc 232440 
aagaagactc gaaggtggca cctagtcttg 232500 
aagccaggac ccgggagaag cacggaagga 232560 
gcctacagga ctgctgtgtg aaagaggaca 232620 
gcaggcagct ctgggctcat tatgagaaac 232680 
tctactgcta agggctgcct taagccatga 232740 
cagtgaggga gctgggcaat tccttaccag 232800 
aacatgctta cttattttga taagcaaaga 232860 
aacttctgta cattgtacca ttaaccacac 232920 
ataaagtgac ttgcccacca tactataaaa 232980 
tcctaaccat aagaccacac agagccatgg 233040 
tgccaagacc tgctaaatac tgttgcttac 233100 
aatttaaatt tcacggagct gctcaagggc 233160 
ccggccagga ctggcattac tctaacatct 233220 
ggattattcc tatgaagtga cattggaatt 2332 80 
aaacttggat agaaatcatt tttcctgtgt 233340 
ggggctccct ggacatgacc ctggagctgt 2334 00 
agacctccca gagtgctgcc cgcacactca 233460 
tgctcagaag tccagtgtaa ttcctcaggc 233520 
taaggtacag actttgttgt gaggttacat 233580 
aaagatcgat acttctggca gcctttatcc 233640 
acatgcgtgg aagggtggat gggtggatgg 233700 
agatgggtag atgggtggat ggagtgatat 233760 
agacaagtac acagggtcct ccagtcttac 233820 
gtgtatcttc tagatttctt ttaaaaacat 233880 
tccagtgact cacatgtata atctaacact 233940 
tgccgggagt ttgagaccag ccttggcaac 234000 
tttaaaagta gcctggtgtc atggcacatg 234060 
gtgggaggat cactcgagga caggaattcc 234120 
cactccagtc tgggcaatgg atcaagatcc 234180 
tacatagaaa taaatgtata taaacacaga 23424 0 
tggagaatga catgcttttt caggagcttt 234300 
agcttaacac atatacagct acttcattct 234360 
ttatgtgctt tctgttattt tcattgtttt 234420 
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gctattgtat ttacttattt attttagaaa 
tcaaactctt gggctcaagc agtcctccca 
gcatgaaccc tgcaatacgg ctggcttctg 
tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg 
tgtgtgtttt ctaactgaac aatctgaatt 
tattctagtc cgagcccagg ctcatgaaga 
ttactgtgcc ctaggatatg tgtacttaaa 
ctattactaa cggtacataa agtcctattt 
ctggacataa cccatatttt caatattggg 
ttaatttcta attgcctgat tactaaatta 
tttatcattc gtgaattacc tgtcctgatc 
atttttcaca tggttttaca gcaatgttta 
tataaaactc tgtctcttta gctgtgctta 
ttttaactat tattttttac aaaattaaac 
tggttttgct tagaaaggcc ttcctcaccc 
ctatttatag ttttaaaaaa tatttagacc 
aatgtgaggg gaccatgttg tttttaataa 
cctgtgagtc atctcttaca ttcccacatg 
attgatctgt ttgtctattc tgtgctgacc 
ttttgatatc tggtatgtca aattttttct 
tatgcatttt tttctttcct ataaacttta 
tttgaaattt ttggattaca ttgaatttct 
tgcatttttt tatgattttt caaaactgac 
aatccataga gtttttacaa cctggaagaa 
ctgagtatcg gaaaaggctt ccacctacct 
agtgttgata taatttgaat acataagaaa 
aagaaaagtc aagccaggtg tggtgggtta 
aggcacagga agattgcttg agcccaggag 
cattggctct acaaaaaatc aaaacattaa 
ggctacctgg gaagctgagt ctggaggatc 
agtcatgttt gcaccaatgc agtctaacct 
aagaaaaaat atgtaaatca taataatacc 
tatgcctagt actaataata ttgttataat 
ttatatatat aagctatcac aaatgttagt 
ccctcactga cccaggcctc ctgggtagaa 
attctgggac actatcttga gctataacta 
aaatatttgt agattaaacc catttttttc 
ccaccttcca tcactcatct gtgtgacttc 
ataattatat attcacacaa tcattgtgat 
atcatccaat cggtgctgac agtggatttc 
aatgcagtgc gcttgccagg actgaggaaa 
gatcacctgt gctgaccctt cagcagcacc 
gtgattttca taaaatagtc gagtttcaaa 
agctatgaag aacagagttt tagaaagtat 
cctatgctgg gtagatagga tagcacggcc 
ctcatatatg tatttactta tactctgcct 
agattagaaa ttcttggctc ctatgtcaca 
gtgtcctgac acaacgggaa cgtgccctgc 
tgaactggcc agtgtgactg agaactgaat 
aatttaaaaa ggtatgtgtg ctctatggac 
ggctatttgt ttttaaatat agtttcatgt 
ggctggctat tatgaatgct aaactgcttt 
acaaggtctg tgccyctgaa aactacctat 
tggtgactgc acacagctcg caaaactgtc 
gaagagaagg aactggcgta tacaagatga 
gttcttaaga actcataggt gactttctga 
ggccttttta tttttatttt attttttatt 
ggctggagtg caatggcaca atctcggctc 
attctgctgc ctcagcctcc tgagtagctg 
taatttttgt agttttagta gagacagggt 
gcctgagctc aggtgatctg tcaggcctct 
catgatcata attgaaaggt cacagaacct 
agaatggaaa ttcaatgaca ttgttttatt 
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caagatgtca ctatgttgcc caggctggcc 234480 
ccttagcctc ccaagtaggc gggactacag 234540 
ctattttaaa ctcgtgtgtg tgtgtgtgtg 234600 
cgcgcgcgtg tgtgtgtgtg tgcgtgtgtg 234660 
caattttaag agattttctt gagctggaat 234720 
tttctgtaaa atacattcca agcagtgaaa 234780 
ttctgataca caaggctgca gcaatttaca 23484 0 
cctatgtcct ataaattccc atgtccagta 234900 
tgatccgatt agttaaaaaa atagatctca 234960 
tgaatgagtc tgaatatctt agataggaga 235020 
ccttaactgt tttgaaattg ggttatttat 235080 
cataatatgg acattaaact tttgttgtgt 235140 
tggtgtctta agtattacca agtttttaat 235200 
acctcttttc ctccatggca cctacccttg 235260 
tctgagcttt aaaaataatc tcatattctc 235320 
tttaatgcat gtgcatttca cttactgtat 235380 
ctaatttatt gacactgacc tatattgccc 235440 
gtatgggtgt gtttctggtt attctcgtcc 235500 
tctattttac tgctataatt gtacagactg 235560 
catcatttct ctttttaaaa atcatcttcc 235620 
gaataaacat gtcgttttct ttttgaaaag 235680 
agatgaattt ggaaagagca tcattttttc 235740 
acctagtcag aaaactaagt gtaaaaattg 235800 
aatacaaatg tggctgaatg actttaaacc 235860 
atgactcaaa agccagatgc aataagacaa 235920 
ttgaaactta tacatggcaa aagtttgcat 235980 
tgtctataat cccagcattt tggaagactg 23604 0 
ttcgagatca gcctgggcaa caaagtgaga 236100 
ctgggtgtgg tggtgcatac ctgtagtccc 236160 
acctgagtcc aggagactga ggctgcagtg 236220 
gcgtgactga gcaagaccct atctcaaaaa 236280 
tgcttcactg ttgtggagag aattaagtag 236340 
tatatacaat gtttttaact atatcatttc 236400 
gttcctccct tctgaaattc atctgagggt 236460 
gcacatttgt attgagaaga caacagttaa 236520 
agataagtca tttttttctt ccatttctaa 236580 
ttttttgtac cataccacca ggatagcttt 236640 
ttaagttcct tcaaatgtaa ctctgtaatt 236700 
tctttaattg caattgattt aatctacctt 236760 
attccttttt ttttctaaca gtaggaatag 236820 
gagggagggg ttgtttccgc cagctgccag 236880 
tgcagcgcta tcctgggcca ggcgcaactt 23694 0 
cggatgggac tttagagctt ctttaatttg 237000 
gcttattcac ttggaattcc ataaaaaata 237 060 
tacctctcac cactggtgtc ataattaaaa 237120 
tatgccaaga gtactggaag tggtgagcta 237180 
gactggcaag cttcccaccc tgcccactga 237240 
atctaatggg acatgtggct accaagcact 237300 
gtttcattgt attgaatttc gtttcacgtt 237360 
gtgggggggc ctatggacaa cacagctctt 237420 
atatacaaac aggttatcac tttcctatgt 237480 
tcgctctctc tctagattcc atcacccagc 237540 
tgtcacaatg acagtgacct cactggcctg 237600 
tttggatgtt caaatgagaa acaaaactgt 237660 
cttctgatat catgtttgcc atgtgttgtg 237720 
tgactgaatg tctgtttcag agacgcttcg 237780 
ttttgagacg gagtcctgcc ctgtttccca 237840 
actgcaacct ccacctccca ggttcaagcg 237900 
ggattacaga tgtgtgccac catgcctggc 237960 
ttcgccatgt tggccaggct ggtctcaaac 238020 
tctatagaat tccagtcttt gtgtcttagt 238080 
ttgtcattag agcacagtac tgccaaataa 238140 
actgagaaca actagagaac tctgcaagtt 238200 
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tcttggctta 
agctattaag 
ggtatggggg 
tgctttatgg 
gcctgccgtc 
ggctgagcac 
tcacttgagg 
aaaatacaaa 
ctgaggcaga 
tgaatcaaga 
taaataaata 
ctaggaagtc 
aaaagaagtc 
agactctgcc 
aatccatgta 
aagaacgcaa 
aggaggcaaa 
ggaggtgcca 
tcatctcagc 
cttctagaat 
atagaaaacc 
tacacctaaa 
tgagtcattc 
cctggcttgg 
ctgactcatg 
gccagcccag 
gctggcctgc 
gactgttgct 
caatcctatt 
aggttctcag 
ttattcaacg 
tacagattgg 
agaaattgta 
gtttagcaag 
cagaattgta 
ccatttaaaa 
atctgtatac 
gacatataca 
tgacgtatag 
agctgatgct 
taagagcaaa 
aattacaaag 
ctgaggcggg 
accttgtctc 
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gactcgatct 
gtgactttta 
gtgttataca 
tgagcaaagc 
ctcgagatga 
ggtggctcac 
tcaggagttc 
aatcagctgg 
agaatcgctt 
ttgctccact 
aataaataag 
ctagccagag 
aaactgtctc 
aaaaggctcc 
caaaaatcag 
tcccatttcc 
ggatctctac 
ctgagccctc 
tgttgttcct 
gaacctttga 
acagaaactg 
tcatcccctt 
ctgagttcac 
tagtgctccc 
ctggagacaa 
cctctgccca 
cagtggccag 
agacattaca 
agtagcttac 
gattctgaat 
ttgcattaaa 
aagtgaagaa 
aggaatgcaa 
attgcaatat 
tttatatata 
ttgcatctaa 
tgaaacctgt 
aagttcatag 
attcaatgca 
aaatctttta 
gttggaggat 
tatcagccag 
tggatcactt 
tactagaaat 



ttattaatac 
tctagcggag 
ggataattgg 
atcaccagca 
aattggcagt 
acctgtcatc 
gagaccagcc 
gtgtggtggc 
gaacccaaga 
gcactccagc 
aaacactgat 
tgatcaggca 
tcttcactgc 
tggaaccgat 
catttccaaa 
aatagccacg 
aaggagaacc 
atcgtggtgc 
acctcaaatt 
gaagggaggt 
gaaataatga 
atgatactca 
tcgcttcaca 
acagcaccaa 
gccacagaga 
cccaggcctc 
tcaagattct 
ctaagcacat 
tgtgggtctg 
tttaaaaaaa 
gtttctaccc 
ataaaagtct 
aaaaaaaaaa 
acaatcttgc 
ctgtcaataa 
aaataaacaa 
aaaacattgc 
attagaagat 
atccattaaa 
tgaaaatgca 
ttatagaacc 
gtgccgtggc 
gaagtcggga 
acaaaaaatt 



99 

attatctatt 
attcctctct 
tgacatctga 
agtgatcaca 
tggggctgat 
ccagcccttt 
tgaccaacgc 
acacacctgt 
ggcagaggtt 
ctgggcaaca 
gtgtctgtca 
agaataagcc 
cgatatgatt 
aaatgactta 
cacagtaaca 
gaatgaaata 
ataaacgaga 
cgttcccgct 
tcaagtccct 
agcagtgcat 
agggttgtct 
tcctctaaca 
ttacatatgt 
aaatccctga 
acttccatcc 
agtccccagt 
ctttctgaaa 
tatatgttgt 
caaagcctta 
atttgtaaag 
agagaaggca 
ttattctcaa 
aaaaaaaagc 
aatcttccta 
gcaattcaaa 
aataggaata 
tgaaagaagt 
gcaatattgt 
atctcagatg 
aagaacctct 
tgattccaaa 
tcacatctgt 
gtttaagacc 
agccaggcat 



aggtaggaaa 
taaagtaatg 
gtgtcttact 
atgtccactg 
tcacagaaac 
gggaggctga 
agcaaaaccc 
ggtcccagct 
gcagtgagcc 
gagtaactct 
ccttctaaag 
ataaaaggca 
ctatacctag 
agtaaagttt 
ttcaagctga 
cctaggaaca 
tgctgagtcc 
ctgggttatt 
caacaaatat 
tgtataggaa 
cttggtttta 
gcaattgaac 
ttctctataa 
99 a 99 c tgac 
cccaccacat 
gttaagttct 
gctagtattt 
acttcatttt 
ctcaaaacat 
gcttatggct 
ataaaaggaa 
gaatacaaga 
cctacaagaa 
aagattatat 
atgaaattaa 
gacttggcaa 
taaagacttc 
taagatgata 
gctttttata 
agtagacaaa 
actgtcagta 
aataccagct 
agcctggcca 
gatgg 



gacatttgtc 
aaaggagata 
tctgcaagcc 
gccgcttttt 
accgatttgt 
ggtggacaga 
atctctacta 
cctcaggagt 
aaggttgcag 
ccttctcaaa 
aaatgaaatg 
tccaaatagg 
aaaaccctaa 
caggatagta 
gcaccaaatc 
cgtataacca 
cagcgaggtc 
tatctgttgc 
aacagaacca 
ttggcattct 
aaataatgta 
ttcaatacaa 
ccacaagcat 
aaacattgtg 
cagccacgga 
gatccctgat 
tatgaggact 
accctttcaa 
atagggctag 
ctcaccactg 
attaaagcta 
cactatgtat 
cttataacaa 
acaaacctaa 
gaccacgatt 
cagttgtaac 
tttaaataga 
gtcctcaaat 
gaatttgaaa 
acaatttttt 
aaactacaat 
ctctgggagg 
acttggtgaa 



238260 
238320 
238380 
238440 
238500 
238560 
238620 
238680 
238740 
238800 
238860 
238920 
238980 
239040 
239100 
239160 
239220 
239280 
239340 
239400 
239460 
239520 
239580 
239640 
239700 
239760 
239820 
239880 
239940 
240000 
240060 
240120 
240180 
240240 
240300 
240360 
240420 
240480 
240540 
240600 
240660 
240720 
240780 
240825 



<210> 2 
<211> 3809 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> 5'UTR 
<222> 1. .57 

<220> 
<221> CDS 
<222> 58. .2565 



<220> 

<221> 3'UTR 
<222> 2566. .3809 



<220> 
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<221> polyA_signal 
<222> 3795. .3800 

<220> 

<221> allele 
<222> 285 
<223> 5-392-222 

<220> 

<221> allele 
<222> 968 
<223> 4-58-318 

<220> 

<221> allele 
<222> 997 
<223> 4-58-289 

<220> 

<221> allele 
<222> 2102 
<223> 5-398-203 

<220> 

<221> allele 
<222> 2283 
<223> 5-400-175 

<220> 

<221> allele 
<222> 2339 
<223> 5-400-231 

<220> 

<221> allele 
<222> 2475 
<223> 5-400-367 

<220> 

<221> allele 
<222> 2539 
<223> 5-402-144 

<220> 

<221> variation 
<222> 345 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 615 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 663 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 666 

<223> polymorphic base T or C 



: polymorphic base G or T 



: polymorphic base G or T 



: polymorphic base G or C 



: polymorphic base A or C 



: polymorphic base C or T 



: polymorphic base C or T 



: polymorphic base A or C 



: polymorphic base C or T 
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<220> 

<221> variation 
<222> 853 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 989 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 1309 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 1472 

<223> polymorphic base A or C 
<220> 

<221> variation 
<222> 1839 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 1913 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 1998 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2319 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 2359 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2404 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2423 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 2454 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 2497 
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<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2499 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2533 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 2665 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 2768 

<223> insertion of T 
<220> 

<22l> variation 
<222> 2855 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2858 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2867 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2870 

<223> polymorphic base T or A 
<220> 

<221> variation 
<222> 2874 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2881 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2882 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2898 

c223> polymorphic base A or G 



<220> 
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<221> variation 
<222> 2910 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2933 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2946 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2957 

<223> polymorphic base T or C 
<220> 

<22l> variation 
<222> 2961 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 2981 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 3001 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 3006 

<223> polymorphic base T or C 
<220> 

<221> variation 
<222> 3015 

<223> polymorphic base A or G 
<220> 

<221> variation 
<222> 3027 

<223> polymorphic base A or G 



<400> 2 

gcgccgccag gctcgcaagc accgcgtagg ccagctggcc ggatcccgcc gtctgtc 57 



atg 


gcg 


gec 


ccc ate ctg aaa gat gta gtg gee tat 


gtt 


gaa 


gtg 


tgg 


Net 


Ala 


Ala 


Pro lie Leu Lys Asp Val Val Ala Tyr 


Val 


Glu 


Val 


Trp 


1 






5 10 






15 




tea 


tec 


aat 


gga aca gaa aat tat tea aag aca ttt 


aca 


aca 


cag 


ctt 


Ser 


Ser 


Asn 


Gly Thr Glu Asn Tyr Ser Lys Thr Phe 


Thr 


Thr 


Gin 


Leu 








20 25 




30 






gtg 


gat 


atg 


ggg gca aag gtt tea aaa act ttt aac 


aaa 


caa 


gta 


act 


Val 


Asp Met 


Gly Ala Lys Val Ser Lys Thr Phe Asn 


Lys 


Gin 


Val 


Thr 






35 


40 


45 








cac 


gtt 


ate 


ttc aaa gat ggc tac cag age act tgg 


gac 


aaa 


get 


cag 


His 


Val 


lie 


Phe Lys Asp Gly Tyr Gin Ser Thr Trp 


Asp 


Lys 


Ala 


Gin 




50 




55 60 
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aag aga ggc 
Lys Arg Gly 
65 

aca get gga 
Thr Ala Gly 

aat gaa cac 
Asn Glu His 

ccc aaa gat 
Pro Lys Asp 
115 

aag aaa ttt 
Lys Lys Phe 

130 
eta gat gat 
Leu Asp Asp 
145 

ata tat act 
lie Tyr Thr 

gag aag aga 
Glu Lys Arg 

acc tct tec 
Thr Ser Ser 
195 

ctg tgt gaa 
Leu Cys Glu 

210 
gaa tac ttt 
Glu Tyr Phe 
225 

aac tea gga 
Asn Ser Gly 

gac att aaa 
Asp lie Lys 

aat att cat 
Asn lie His 
275 

cag aaa ttt 
Gin Lys Phe 

290 
aat att gca 
Asn He Ala 
305 

atg tct cag 
Met Ser Gin 

tct tea aca 
Ser Ser Thr 

tea gta aag 
Ser Val Lys 
355 

gaa aaa tgc 
Glu Lys Cys 

370 
ctg cag ctg 
Leu Gin Leu 
385 



gta aag etc gtt teg 
Val Lys Leu Val Ser 
70 

gca cac att gat gaa 
Ala His He Asp Glu 
85 

tta tea age eta att 
Leu Ser Ser Leu He 
100 

ttt aat ttt aaa aca 
Phe Asn Phe Lys Thr 
120 

gag aaa atg get aaa 
Glu Lys Met Ala Lys 
135 

gat gta cct att etc 
Asp Val Pro He Leu 
150 

ccc aca att gaa att 
Pro Thr He Glu He 
165 

tta caa gag atg aag 
Leu Gin Glu Met Lys 
180 

caa atg att cag cag 
Gin Met He Gin Gin 
200 

gca cct ttg aac att 
Ala Pro Leu Asn He 
215 

get ggt ggc tta cac 
Ala Gly Gly Leu His 
230 

tgt gga aat cag gaa 
Cys Gly Asn Gin Glu 
245 

agt gat gtg tgt att 
Ser Asp Val Cys He 
260 

tea tea cca tct ttc 
Ser Ser Pro Ser Phe 
280 

ctg agt aat ctt tea 
Leu Ser Asn Leu Ser 
295 

ggt aaa gta gtc acc 
Gly Lys Val Val Thr 
310 

gag acg ttt gaa gag 
Glu Thr Phe Glu Glu 
325 

aaa ggc cac ctt ttg 
Lys Gly His Leu Leu 
340 

aga aaa aga gta tea 
Arg Lys Arg Val Ser 
360 

aag aga aag agg age 
Lys Arg Lys Arg Ser 
375 

tgc agg teg gaa ggc 
Cys Arg Ser Glu Gly 
390 



104 

gtg etc tgg gtk gaa 
Val Leu Trp Val Glu 
75 

tea ttg ttc cct gca 
Ser Leu Phe Pro Ala 
90 

aaa aaa aaa cgt aaa 
Lys Lys Lys Arg Lys 
105 

cca gaa aat gat aag 
Pro Glu Asn Asp Lys 
125 

gag eta caa agg caa 
Glu Leu Gin Arg Gin 
140 

tta ttt gaa tct aat 
Leu Phe Glu Ser Asn 
155 

aat agt agt cac cac 
Asn Ser Ser His His 
170 

gag aaa agg gaa aat 
Glu Lys Arg Glu Asn 
185 

tct cat gat aat cca 
Ser His Asp Asn Pro 
205 

tea cgt gat act ttg 
Ser Arg Asp Thr Leu 
220 

tea tct ttt gat gat 
Ser Ser Phe Asp Asp 
235 

agg aag ttg gaa gga 
Arg Lys Leu Glu Gly 
250 

tct tea ctt gta ttg 
Ser Ser Leu Val Leu 
265 

act cac etc gat aaa 
Thr His Leu Asp Lys 
285 

aag gaa gaa ata aac 
Lys Glu Glu He Asn 
300 

cct sac caa aag cag 
Pro Xaa Gin Lys Gin 
315 

aag tat cgt ttg tct 
Lys Tyr Arg Leu Ser 
330 

ata cat tea aga ccc 
He His Ser Arg Pro 
345 

cat ggc tec cat tea 
His Gly Ser His Ser 
365 

acc agg aga tct ate 
Thr Arg Arg Ser He 
380 

agg ctg cag cac gtg 
Arg Leu Gin His Val 
395 
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aaa tgc agg 297 
Lys Cys Arg 
80 

get aat atg 345 
Ala Asn Met 
95 

tgt atg cag 393 

Cys Met Gin 

110 

aga ttt cag 441 
Arg Phe Gin 

aaa aca aat 489 
Lys Thr Asn 

ggt tea tta 537 
Gly Ser Leu 
160 

age gca atg 585 
Ser Ala Met 
175 

Ctt tCC CCC 633 

Leu Ser Pro 

190 

agt aac tct 681 
Ser Asn Ser 

tgt tea gat 729 
Cys Ser Asp 

ctt tgt gga 777 
Leu Cys Gly 
240 

tec att aat 825 
Ser He Asn 
255 

aaa gca aat 873 

Lys Ala Asn 

270 

tea agt cct 921 
Ser Ser Pro 

ttg caa aka 969 
Leu Gin Xaa 

get gca ggt 1017 
Ala Ala Gly 
320 

cct acc tta 1065 
Pro Thr Leu 
335 

agg agt tec 1113 

Arg Ser Ser 

350 

cct ccg aag 1161 
Pro Pro Lys 

atg ccg agg 1209 
Met Pro Arg 

gcg gga cct 1257 
Ala Gly Pro 
400 
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gcc ctg gag 
Ala Leu Glu 

tea cct gat 
Ser Pro Asp 

tct cag ctg 
Ser Gin Leu 
435 

aag aag gag 
Lys Lys Glu 

450 
ggc aaa aaa 
Gly Lys Lys 
465 

ate tec agt 
lie Ser Ser 

agt tgc gtg 
Ser Cys Val 

get ggg aaa 
Ala Gly Lys 
515 

att gag gac 
lie Glu Asp 

530 
ttg gaa gga 
Leu Glu Gly 
545 

aca cag aac 
Thr Gin Asn 

gaa gcc cag 
Glu Ala Gin 

gag acg tct 
Glu Thr Ser 
595 

agt gtt aaa 
Ser Val Lys 

610 
gac ggc ttt 
Asp Gly Phe 
625 

ggg aga ggc 
Gly Arg Gly 

tct gaa aag 
Ser Glu Lys 

ttt tea att 
Phe Ser He 
675 

ggg aag cca 
Gly Lys Pro 

690 
tgc tgg gtt 
Cys Trp Val 
705 

cac tgg att 
His Trp He 



get ctt age tgt ggg 
Ala Leu Ser Cys Gly 
405 

aat ctt aag gaa agg 
Asn Leu Lys Glu Arg 
420 

cca tea age cct get 
Pro Ser Ser Pro Ala 
440 

aga aca age ata ttt 
Arg Thr Ser He Phe 
455 

acc aga aca gtt gac 
Thr Arg Thr Val Asp 
470 

cct egg aaa act gga 
Pro Arg Lys Thr Gly 
485 

act tct gcc cct gaa 
Thr Ser Ala Pro Glu 
500 

gaa gac gca tgc cca 
Glu Asp Ala Cys Pro 
520 

cct get ctt cca aaa 
Pro Ala Leu Pro Lys 
535 

age ctt gaa gaa atg 
Ser Leu Glu Glu Met 
550 

aaa ggt acc act tec 
Lys Gly Thr Thr Ser 
565 

agt gaa cat gag cca 
Ser Glu His Glu Pro 
580 

aca gaa gag aag gaa 
Thr Glu Glu Lys Glu 
600 

aat aga cca aca agg 
Asn Arg Pro Thr Arg 
615 

aag gac etc ate aaa 
Lys Asp Leu He Lys 
630 

aaa aag cca aca aga 
Lys Lys Pro Thr Arg 
645 

cag aat gtc gtc ate 
Gin Asn Val Val He 
660 

gca cca gac gtc tgt 
Ala Pro Asp Val Cys 
680 

ctt cgc acc ctg aat 
Leu Arg Thr Leu Asn 
695 

etc tct tat gat tgg 
Leu Ser Tyr Asp Trp 
710 

tct gag gag ccg ttc 
Ser Glu Glu Pro Phe 
725 



105 



gag 


tct 


tea 


tat 


gat 


Glu 


Ser 


Ser 


Tyr 


Asp 




410 








tat 


tea 


gag 


aat 


ctt 


Tyr 


Ser 


Glu 


Asn 


Leu 


425 










cag 


ttg 


aac 


tqc 


aqa 


Gin 


Leu 


Ser 


Cys 


Arg 










445 


gaa 


atg 


tct 


gat 


ttt 


Glu 


Met 


Ser 


Asp 


Phe 








460 




att 


acc 


aat 


ttc 


aca 


He 


Thr 


Asn 


Phe 


Thr 






475 






aat 


aat 


aaa 


aac 

33 v 


cat 


Asn 


Glv 


Glu 


Glv 


Ara 




490 








aaa 


acc 


eta 


aaa 

"33 


tat 


Glu 


Ala 


Leu 


Ara 


Cys 


505 










aaa 


aaa 


aat 


aae 


ttt 


Glu 


Glv 


Asn 


Glv 


Phe 










525 


aaa 


cat 


gat 


aat 
you 


aat 


Glv 


His 


Asp 


Asp 


Asp 








540 




aaa 


gaa 


aca 


gtt 


aat 
yy »- 


Lvs 


Glu 


Ala 


Val 


Glv 






555 






aaa 


ata 


tea 


aac 


tec 


Lys 


He 


Ser 


Asn 


Ser 




570 








tat 


ttt 


ata 


att 

3 ut 


gac 


Cys 


Phe 


He 


Val 


Asp 


585 










aac 


tta 


ccc 


yy d 


aaa 


Asn 


Leu 


Pro 


Glv 


Glv 










605 


cat 


aat 


att 

y 


tta 


gat 


His 


Asp 


Val 


Leu 


Asp 








620 




cct 


cat 


aaa 


aaa 


tta 


Pro 


His 


Glu 


Glu 


Leu 






635 






aca 


tta 


ate 


ata 


aca 


Thr 


Leu 


Val 


Met 


Thr 




650 








cag 


gtt 


gtg 


gat 


aaa 


Gin 


Val 


Val 


Asp 


Lys 


665 










gag 


amc 


acg 


act 


cac 


Glu 


Xaa 


Thr 


Thr 


His 










685 


ata 


ctg 


eta 


aaa 
yy« 


att 


Val 


Leu 


Leu 


Gly 


He 








700 




gtg 


eta 


tgg 


tct 


tta 


Val 


Leu 


Trp 


Ser 


Leu 






715 






gaa 


ctg 


tct 


cac 


cac 


Glu 


Leu 


Ser 


His 


His 



730 



gac tat ttt 1305 
Asp Tyr Phe 
415 

cct cct gaa 1353 

Pro Pro Glu 

430 

agt ctt tct 1401 
Ser Leu Ser 

tec tgc gtt 1449 
Ser Cys Val 

gca aaa acc 14 97 
Ala Lys Thr 
480 

gca act teg 1545 
Ala Thr Ser 
495 

tgt aga cag 1593 

Cys Arg Gin 

510 

tct tac acc 1641 
Ser Tyr Thr 

tta act cct 1689 
Leu Thr Pro 

ctg aaa age 1737 
Leu Lys Ser 
560 

tct gaa ggc 1785 
Ser Glu Gly 
575 

tgt aac atg 1833 

Cys Asn Met 

590 

tac agt gga 1881 
Tyr Ser Gly 

gac tea tgt 1929 
Asp Ser Cys 

aag aaa agt 1977 
Lys Lys Ser 
640 

age atg cca 2025 
Ser Met Pro 
655 

ttg aaa ggc 2073 

Leu Lys Gly 

670 

gtg ctt tec 2121 
Val Leu Ser 

gcg cgt ggc 2169 
Ala Arg Gly 

gaa ttg ggt 2217 
Glu Leu Gly 
720 

ttc cct gca 2265 
Phe Pro Ala 
735 



WO 01/14550 



PCT/IB00/01098 



106 

get ccc ctg tgc cga agy gag tgc cac ttg tct gca ggg ccg tac cgc 2313 
Ala Pro Leu Cys Arg Ser Glu Cys His Leu Ser Ala Gly Pro Tyr Arg 
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gga acc etc ttt gee gac cag cca gyg atg ttt gtc teg cct gee age 2361 
Gly Thr Leu Phe Ala Asp Gin Pro Xaa Met Phe Val Ser Pro Ala Ser 
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acaagatgac ttctgatatc atgtttgcca tgtgttgtgg ttcttaagaa ctcataggtg 2725 
actttctgat gactgaatgt ctgtttcaga gaegcttegg gectttttat ttttatttta 2785 
ttttttattt tttgagaegg agtcctgccc tgtttcccag gctggagtgc aatggcacaa 2845 
tctcggctca ctgcaacctc cacctcccag gttcaagega ttctgctgcc tcagcctcct 2905 
gagtagctgg gattacagat gtgtgccacc atgcctggct aatttttgta gttttagtag 2965 
agacagggtt tcgccatgtt ggccaggctg gtctcaaacg cctgagctca ggtgatctgt 3025 
caggcctctt ctatagaatt ccagtctttg tgtcttagtc atgatcataa ttgaaaggtc 3085 
acagaacctt tgtcattaga gcacagtact gecaaataaa gaatggaaat tcaatgacat 3145 
tgttttatta ctgagaacaa ctagagaact ctgcaagttt cttggcttag actcgatctt 3205 
tattaataca ttatctatta ggtaggaaag acatttgtca gctattaagg tgacttttat 3265 
etageggaga ttcctctctt aaagtaatga aaggagatag gtatgggggg tgttatacag 3325 
gataattggt gacatctgag tgtcttactt ctgcaagcct gctttatggt gagcaaagca 3385 
tcaccagcaa gtgatcacaa tgtccactgg ccgctttttg cctgccgtcc tcgagatgaa 3445 
attggcagtt ggggctgatt cacagaaaca ccgatttgtg gctgagcacg gtggctcaca 3505 
cctgtcatcc cagccctttg ggaggctgag gtggacagat cacttgaggt caggagttcg 3565 
agaccagcct gaccaacgca gcaaaaccca tctctactaa aaatacaaaa atcagctggg 3625 
tgtggtggca cacacctgtg gtcccagctc ctcaggagtc tgaggcagaa gaatcgcttg 3685 
aacccaagag gcagaggttg cagtgageca aggttgcagt gaatcaagat tgctccactg 3745 
cactccagcc tgggcaacag agtaactctc cttctcaaat aaataaataa ataaataaga 3805 
aaea 3809 
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Before describing the invention in greater detail, the following definitions are set forth to 
illustrate and define the meaning and scope of the terms used to describe the invention herein. 

The terms " PG-3 gene ", when used herein, encompasses genomic, mRNA and cDNA 
sequences encoding the PG-3 protein, including the untranscribed regulatory regions of the genomic 
5 DNA. 

The term " heterologous protein " when used herein, is intended to designate any protein or 
polypeptide other than the PG-3 protein. More particularly, the heterologous protein may be a 
compound which can be used as a marker in further experiments with a PG-3 regulatory region. 

The term " isolated " requires that the material be removed from its original environment (e. 

10 g., the natural environment if it is naturally occurring). For example, a naturally-occurring 

polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide 
or DNA or polypeptide, separated from some or all of the coexisting materials in the natural system, 
is isolated. Such a polynucleotide could be part of a vector and/or such a polynucleotide or 
polypeptide could be part of a composition, and still be isolated in that the vector or composition is 

1 5 not part of its natural environment. 

The term " purified " does not require absolute purity; rather, it is intended as a relative 
definition. Purification of starting material or natural material to at least one order of magnitude, 
preferably two or three orders, and more preferably four or five orders of magnitude is expressly 
contemplated. As an example, purification from 0.1 % concentration to 10 % concentration is two 

20 orders of magnitude. To illustrate, individual cDNA clones isolated from a cDNA library have been 
conventionally purified to electrophoretic homogeneity. The sequences obtained from these clones 
could not be obtained directly either from the library or from total human DNA. The cDNA clones 
are not naturally occurring as such, but rather are obtained via manipulation of a partially purified 
naturally occurring substance (messenger RNA). The conversion of mRNA into a cDNA library 

25 involves the creation of a synthetic substance (cDNA) and pure individual cDNA clones can be 
isolated from the synthetic library by clonal selection. Thus, creating a cDNA library from 
messenger RNA and subsequendy isolating individual clones from that library results in an 
approximately 10 4 -10 6 fold purification of the native message. 

The term "purified" is further used herein to describe a polynucleotide or polynucleotide of 

30 the invention which has been separated from other compounds including, but not limited to other 
polynucleotides or polypeptides (such as the enzymes used in the synthesis of the polynucleotide), 
carbohydrates, lipids, etc.,. The term "purified" may be used to specify the separation of 
monomelic polypeptides of the invention from oligomeric forms such as homo- or hetero- dimers, 
trimers, etc. The term "purified" may also be used to specify the separation of covalently closed 

35 polynucleotides from linear polynucleotides. A polynucleotide is substantially pure when at least 
about 50%, preferably 60 to 75% of a sample exhibits a single polynucleotide sequence and 
conformation (linear versus covalently close). A substantially pure polypeptide or polynucleotide 



